On-Device Medical ASR: A New Perspective for Digital Healthcare

In the rapidly evolving landscape of artificial intelligence, the need to balance innovation with regulatory compliance, particularly regarding data privacy, is increasingly pressing. This is especially true in the healthcare sector, where patient information is among the most sensitive. In this context, Omi Health has announced the release of Omi Med STT v1, a 0.6B parameter Automatic Speech Recognition (ASR) model, resulting from meticulous fine-tuning of NVIDIA's Parakeet TDT 0.6B v2 model. The primary goal of this initiative is to provide a compact and high-performing ASR solution capable of operating locally on devices, eliminating the need to send patient audio to external cloud systems for transcription. This on-device architecture directly addresses data sovereignty and privacy protection requirements, which are crucial for healthcare organizations.

The Omi Med STT v1 model has been released under a CC-BY-4.0 license, making its weights freely accessible. Its versatility is ensured by compatibility with various operating systems, including Mac, Windows, and Linux, and its ability to adapt to available hardware. For example, it leverages MLX on Apple Silicon, NeMo on CUDA-enabled systems, and GGUF/parakeet.cpp for CPUs. This deployment flexibility is fundamental for IT infrastructures aiming to maintain full control over their AI workloads, without relying on specific hardware configurations or cloud vendors. The choice of 8-bit quantization (q8) as the default setting reflects a compromise between model size and accuracy, although a 4-bit version (q4) was also explored but ultimately discarded due to excessive regression in drug name accuracy, a critical detail in the medical field.

Performance and Trade-offs in a Clinical Context

Omi Med STT v1's performance was evaluated through a rigorous benchmark, based on 1,513 medical audio clips totaling 7.18 hours, using the same dataset and scoring system for all compared models. The focus was on Medical-WER (M-WER), which measures errors specific to clinical terms, considered the most relevant indicator for medical transcription. Omi Med STT v1 achieved an M-WER of 2.37%, positioning it competitively against other open-source and local models. For instance, it outperformed Qwen3 ASR (0.6B and 1.7B), Whisper Large v3 Turbo, and Parakeet TDT 0.6B v3, and significantly improved upon the base Parakeet TDT 0.6B v2 model from which it derives, reducing M-WER by approximately 3.5 times and halving the overall WER. Only VibeVoice-ASR 9B showed a slightly lower M-WER (1.78%), but with a model approximately 15 times larger and slower processing speed.

Comparison with cloud APIs, both general-purpose and medically specific, reveals an interesting picture. While some cloud services like ElevenLabs Scribe v2 and AssemblyAI Universal-3 Pro Medical may offer marginally lower M-WER and higher drug name accuracy, Omi Med STT v1 stands out for its exceptionally high Real-Time Factor (RTFx) locally. With an RTFx of 145x on an A10 GPU and approximately 68x on an Apple Silicon Mac, the model offers a structural latency edge, as processing occurs directly on the device, eliminating network and queue delays typical of cloud solutions. This aspect is crucial for applications requiring immediate responses, such as real-time transcription during medical consultations. It is also important to highlight a critical flaw found in some cloud models like Gemini 3.1 Pro and 3.5 Flash: the tendency to fabricate non-existent clinical details on benign audio, a type of hallucination that represents an unacceptable risk in the medical field. Omi Med STT v1 and other dedicated ASR models did not exhibit this behavior.

Implications for On-Premise Deployment and Data Sovereignty

The availability of a model like Omi Med STT v1 has profound implications for organizations evaluating on-premise or hybrid deployment strategies for AI workloads. The ability to perform inference locally not only ensures data sovereignty, keeping sensitive information within the organization's security perimeter, but can also significantly impact the Total Cost of Ownership (TCO). While the initial investment in hardware (such as A10 GPUs or Apple Silicon systems) may be higher than using consumption-based cloud services, the elimination of recurring API and data transfer costs, combined with full control over the infrastructure, can lead to substantial long-term savings. For those evaluating on-premise deployments, AI-RADAR offers analytical frameworks on /llm-onpremise to assess the trade-offs between CapEx and OpEx, as well as performance and security implications.

The main challenge for Omi Med STT v1, as highlighted by the benchmarks, lies in the accuracy of drug names, identified as the weakest area (4.75% drug M-WER). This is a critical aspect for patient safety and the precision of clinical documentation. The Omi Health team has already stated that this will be the primary focus for version v2 of the model. The model's training utilized approximately 127 hours of audio, a combination of real (71%) and synthetic (29%) data, sourced from various medical settings and contexts, ensuring good diversity. This hybrid training approach is common for addressing the scarcity of annotated medical data and improving model robustness. Validation was performed on a locked and unseen test split, ensuring no overlap with training data.

Future Prospects and Community Contribution

The release of Omi Med STT v1 marks an important step towards more accessible and secure ASR solutions for the healthcare sector. Future developments for the project include a streaming version and a multilingual one, which would further expand the model's reach and utility. The request for feedback from the community and real-world users is a positive sign, indicating a collaborative approach to continuous development and improvement. This model represents a concrete example of how innovation in LLMs and AI can be directed towards solutions that prioritize privacy and control, while offering competitive performance. For CTOs, DevOps leads, and infrastructure architects, Omi Med STT v1 offers a compelling use case for exploring the potential of on-premise AI workloads, especially in highly regulated sectors like healthcare.