LuxTTS: A new TTS model for voice cloning

A new text-to-speech (TTS) model, called LuxTTS, has recently been released. This diffusion-based model stands out for its compact size (120 million parameters) and high-quality voice cloning.

Key Features

LuxTTS offers the following features:

  • High-quality voice cloning: comparable to much larger models.
  • Efficiency: requires less than 1GB of VRAM.
  • Speed: faster than real-time even on CPUs.

The model promises further improvements in terms of speed and vocoder quality. The source code and examples are available on GitHub, while the pre-trained model is hosted on Hugging Face.

Text-to-Speech (TTS) models continue to evolve, offering increasingly realistic and accessible solutions for speech synthesis. The ability of voice cloning, in particular, opens new frontiers in the field of accessibility, content creation, and human-machine interaction.