Multilingual Speech Recognition: A Family Affair

Large Language Model (LLM)-based Automatic Speech Recognition (ASR) systems have demonstrated strong performance with limited resources by linking a frozen speech encoder to a pretrained LLM via a lightweight connector. New research focuses on optimizing these connectors in multilingual contexts.

Connector Sharing Based on Language Families

The research proposes a connector-sharing strategy based on linguistic family membership. Instead of training a separate connector for each language, a single connector is used per language family. This approach reduces the number of parameters required and improves generalization across different domains. The results show that this strategy is effective on two multilingual LLMs and two real-world corpora, including both curated and crowd-sourced speech.

Implications for Deployment

The ability to reduce the number of parameters and improve generalization makes this strategy particularly interesting for the deployment of multilingual ASR systems in resource-constrained environments. For those evaluating on-premise deployments, there are trade-offs to consider carefully; AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.