Mistral AI has announced Voxtral TTS, a 3-billion-parameter text-to-speech (TTS) model, released with open-source weights. According to Mistral, Voxtral TTS outperforms ElevenLabs Flash v2.5 in human preference tests.

Technical characteristics

The Voxtral TTS model is designed for efficiency, with a memory footprint of approximately 3 GB of RAM. This potentially makes it suitable for running on hardware with limited resources. The model boasts a time-to-first-audio of 90 milliseconds and supports nine different languages.

Relevance

The release of an open-source TTS model with claimed performance exceeding proprietary solutions represents an interesting option for developers and companies looking for efficient and customizable speech synthesis solutions. For those evaluating on-premise deployments, there are trade-offs to consider, and AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.