KaniTTS2 is an open-source text-to-speech (TTS) model designed for real-time conversational applications. With 400 million parameters, this model offers voice cloning capabilities and supports several languages, including English and Spanish, with plans for future expansion.
Technical Specifications
- Parameters: 400 million (BF16)
- Sample rate: 22kHz
- Voice Cloning: Yes
- VRAM requirement: 3GB
- Training time: 6 hours on 8x H100
A particularly interesting aspect is the availability of the complete pre-training code. This allows users to develop custom TTS models for specific languages, accents, or domains. The pre-trained model and code are available on Hugging Face and GitHub under the Apache 2.0 license.
For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!