Qwen3-TTS represents a significant step forward in local speech synthesis. This open-source solution offers an alternative to ElevenLabs and OpenAI, runnable directly on the user's hardware.
Key Features
- Speed: End-to-end latency of approximately 97ms for streaming.
- Natural Voice Control: Ability to give natural language instructions to modulate the tone and emotion of the voice.
- Voice Cloning: Voice cloning from a reference clip of just 3 seconds.
- OpenAI Compatibility: Works natively with the OpenAI Python client, requiring only a change to the base URL.
- Multilingual: Supports 10+ languages, including Italian, English, Japanese, and German.
Technical Details
Qwen3-TTS uses a new dual-track hybrid architecture and the Qwen3-TTS-Tokenizer-12Hz tokenizer for acoustic compression. Versions of 0.6B (fast and light) and 1.7B (high fidelity) are available. It supports FlashAttention 2 to reduce memory usage.
The low latency makes real-time voice conversation more realistic, opening up new possibilities for integration into local LLM agents.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!