KaniTTS2 is an open-source text-to-speech (TTS) model designed for real-time conversational applications. With 400 million parameters, this model offers voice cloning capabilities and supports several languages, including English and Spanish, with plans for future expansion.

Technical Specifications

  • Parameters: 400 million (BF16)
  • Sample rate: 22kHz
  • Voice Cloning: Yes
  • VRAM requirement: 3GB
  • Training time: 6 hours on 8x H100

A particularly interesting aspect is the availability of the complete pre-training code. This allows users to develop custom TTS models for specific languages, accents, or domains. The pre-trained model and code are available on Hugging Face and GitHub under the Apache 2.0 license.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.