Qwen3-TTS Studio: Cloned Voice and Local Podcasting

A developer has created Qwen3-TTS Studio, a user interface for the Qwen3-TTS model, focused on voice cloning and automated podcast generation. The application allows you to clone a voice from an audio sample of just 3 seconds.

Key Features:

  • Voice cloning with a 3-second audio sample.
  • Fine-grained control of synthesis parameters (temperature, top-k, top-p).
  • Automated podcast generation from a topic: AI writes the script, assigns voices, and synthesizes the audio.
  • Support for 10 languages (Korean, English, Chinese, Japanese, etc.).

Currently, the system uses gpt5.2 for script generation, but the architecture is modular and allows it to be replaced with local LLMs such as Qwen or Llama.

Voice synthesis is performed entirely locally, leveraging macOS MPS or Linux CUDA, eliminating the need for external API calls and reducing costs.

The source code is available on GitHub.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.