Qwen3-TTS is a series of powerful speech generation capabilities developed by Qwen, offering comprehensive support for voice clone, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control. It provides developers and users with the most extensive set of speech generation features available.
The entire Qwen3-TTS multi-codebook model series is now open-sourced, featuring two sizes: 1.7B and 0.6B. The 1.7B model delivers peak performance and powerful control capabilities, while the 0.6B model offers an ideal balance between performance and efficiency. The models support 10 mainstream languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian) along with various dialects to meet global application demands. Furthermore, the models exhibit strong contextual understanding, allowing them to adapt tone, rhythm, and emotional expression based on instructions and text semantics, while significantly improving robustness to input text noise.