Mistral AI has released Voxtral Mini 4B Realtime 2602, a multilingual, realtime speech-transcription model.

Key Features

  • Real-time transcription: Voxtral Mini offers transcriptions with latency below 500ms, comparable to offline systems.
  • Multilingual support: The model supports 13 languages, expanding its applications in various contexts.
  • Streaming architecture: The natively streaming architecture and a custom causal audio encoder allow configurable transcription delays (240ms to 2.4s), balancing latency and accuracy.
  • On-device optimization: As a 4B-parameter model, Voxtral Mini is optimized for deployment on devices with minimal hardware resources, with throughput exceeding 12.5 tokens per second.

Applications

Voxtral Mini is ideal for applications like voice assistants and live subtitling. Its ability to operate in real-time with contained hardware requirements makes it suitable for scenarios where low latency is critical.

Considerations

The ability to balance latency and accuracy through configuration of the transcription delay offers flexibility in implementation. Optimization for on-device execution opens the way for new applications in edge computing.