Qwen3-ASR: Open-Source Speech Recognition
The Qwen3-ASR model family, developed by Qwen, offers speech recognition (ASR) and language identification capabilities for a total of 52 languages and dialects. The models, available in two variants (1.7B and 0.6B parameters), are based on the Qwen3-Omni foundation model and are trained on a large speech dataset.
Key Features
- All-in-one: Support for language identification and speech recognition in 30 languages and 22 Chinese dialects, as well as various English accents.
- Performance and Speed: The Qwen3-ASR-1.7B model offers high recognition quality even in complex acoustic environments. The 0.6B version prioritizes efficiency, achieving a processing speed of 2000 simultaneous transcriptions with a concurrency of 128.
- Forced Alignment: Qwen3-ForcedAligner-0.6B allows predicting timestamps for arbitrary units within audio snippets up to 5 minutes in 11 languages.
- Comprehensive Inference Toolkit: In addition to the weights and architecture of the models, an inference framework is provided that supports vLLM-based batch inference, asynchronous serving, streaming, and timestamp prediction.
For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!