Multilingual Speech Recognition: A Family Affair
Large Language Model (LLM)-based Automatic Speech Recognition (ASR) systems have demonstrated strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. New research focuses on optimizing these connectors in multilingual contexts.
Connector Sharing Based on Language Families
The research proposes a connector-sharing strategy based on linguistic family membership. Instead of training a separate connector for each language, a single connector is used per language family. This approach reduces the number of parameters required and improves generalization across different domains. The results show that this strategy is effective on two multilingual LLMs and two real-world corpora, including both curated and crowd-sourced speech.
Implications for Deployment
The ability to reduce the number of parameters and improve generalization makes this strategy particularly interesting for the deployment of multilingual ASR systems in resource-constrained environments. For those evaluating on-premise deployments, there are trade-offs to consider carefully; AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!