Mistral AI has announced Voxtral TTS, a 3-billion-parameter text-to-speech (TTS) model, released with open-source weights. According to Mistral, Voxtral TTS outperforms ElevenLabs Flash v2.5 in human preference tests.
Technical characteristics
The Voxtral TTS model is designed for efficiency, with a memory footprint of approximately 3 GB of RAM. This potentially makes it suitable for running on hardware with limited resources. The model boasts a time-to-first-audio of 90 milliseconds and supports nine different languages.
Relevance
The release of an open-source TTS model with claimed performance exceeding proprietary solutions represents an interesting option for developers and companies looking for efficient and customizable speech synthesis solutions. For those evaluating on-premise deployments, there are trade-offs to consider, and AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!