SoproTTS, a side project, has released version 1.5 of its text-to-speech (TTS) model. This 135M parameter model was trained for approximately $100 using a single GPU.
Performance
SoproTTS v1.5 boasts the following features:
- 250 ms TTFA streaming latency
- RTF (Real-Time Factor) of 0.05 (approximately 20ร real-time) on CPU
- Zero-shot voice cloning
The model, while not perfect, represents an improvement over previous versions, offering reduced size, increased speed, and stability. The training code will be made available in the future.
For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!