Microsoft Research has introduced Paza, an initiative aimed at promoting voice technology for low-resource languages. Paza includes PazaBench, a leaderboard for automatic speech recognition (ASR) focused on languages with data scarcity, and Paza ASR models, optimized for use in real-world contexts.
PazaBench: a new ASR leaderboard
PazaBench is the first ASR leaderboard dedicated to low-resource languages, with initial coverage of 39 African languages and 52 state-of-the-art ASR and language models. The platform aggregates public and community-sourced datasets, making it easier to evaluate model performance in different languages and regions.
PazaBench tracks three core metrics:
- Character Error Rate (CER): important for languages with complex word forms.
- Word Error Rate (WER): for word-level transcription accuracy.
- RTFx (Inverse Real-Time Factor): measures how fast transcription runs relative to real-time audio duration.
Paza ASR Models: built with and for Kenyan languages
The Paza ASR models consist of three fine-tuned ASR models, based on state-of-the-art architectures. Each model targets Swahili (a mid-resource language) and five low-resource Kenyan languages: Dholuo, Kalenjin, Kikuyu, Maasai, and Somali. The models have been optimized using public and proprietary datasets.
Paza models include:
- Paza-Phi-4-Multimodal-Instruct: a next-generation language model, optimized for transcription in six languages.
- Paza-MMS-1B-All: a model optimized on Meta's mms-1b-all model, which improves transcription accuracy while maintaining cross-lingual generalization.
- Paza-Whisper-Large-v3-Turbo: a model optimized on OpenAI's whisper-large-v3-turbo base model, which offers reliable ASR capabilities.
Microsoft intends to expand PazaBench beyond African languages and evaluate state-of-the-art ASR models in a larger number of low-resource languages globally. The company is also developing practical guides to help the ecosystem curate datasets, optimize models, and evaluate them in real-world conditions.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!