Raon-Speech and Raon-SpeechChat: Open-Source LLMs for Speech Understanding and Generation
The landscape of Large Language Models (LLMs) continues to evolve, with a growing focus on integrating speech capabilities. In this context, Raon-Speech and Raon-SpeechChat have been introduced as two models aiming to bridge the gap between text and voice, offering advanced solutions for speech understanding, answering, and generation. These 9-billion-parameter models are released as open source, including model checkpoints, training and inference pipelines, and an interactive demo.
The open-source approach is particularly relevant for CTOs, DevOps leads, and infrastructure architects seeking flexible and controllable solutions. Access to the source code and complete pipelines enables self-hosted deployment, which is fundamental for organizations prioritizing data sovereignty, regulatory compliance, and autonomous management of operational costs.
Raon-Speech Architecture and Capabilities
Raon-Speech serves as the foundation of this initiative. It is a speech language model (SpeechLM) designed to handle both English and Korean, excelling in speech understanding, answering, and generation. Its unique feature lies in its ability to transform a pre-trained LLM into a SpeechLM while preserving strong text capabilities. This means the model not only processes speech but also maintains robust proficiency in text handling.
Raon-Speech's training involved 1.38 million hours of highly curated English and Korean speech and text datasets. The process was structured into three distinct stages: speech modules alignment, end-to-end SpeechLM pre-training with knowledge distillation, and multi-task preference optimization-based post-training. Across 42 English and Korean speech and text benchmarks, Raon-Speech established the strongest overall profile on speech-centric tasks compared to eight similarly sized recent audio foundation models, including Qwen2.5-Omni and Fun-Audio-Chat, while preserving strong text question answering performance.
Raon-SpeechChat for Full-Duplex Conversation
Building upon Raon-Speech's robust foundation, Raon-SpeechChat was developed as a high-performing extension designed to enable natural full-duplex conversation in real-time. This capability is crucial for applications requiring fluid and dynamic voice interactions, such as advanced virtual assistants or conversational user interfaces.
Raon-SpeechChat was continually trained on 119,000 hours of time-aligned real and synthetic dialogue data. Its complementary training process unfolded in three stages: causal encoder adaptation, full-duplex pre-training, and full-duplex fine-tuning for voice and role-control. In full-duplex benchmarks, Raon-SpeechChat demonstrated its clearest strengths in turn-taking and interruption-sensitive behaviors, such as those covered by FDB v1.0, and remained competitive across the broader full-duplex evaluation suite.
Implications for Deployment and Data Sovereignty
The release of Raon-Speech and Raon-SpeechChat as open-source projects offers significant opportunities for companies looking to integrate advanced speech processing capabilities into their infrastructures. The availability of training and inference pipelines allows organizations to customize and deploy these models in on-premise or hybrid environments. This approach is particularly advantageous for sectors with stringent compliance and data sovereignty requirements, where managing sensitive data within their own boundaries is an absolute priority.
Choosing a self-hosted deployment, facilitated by open-source solutions like Raon-Speech, enables granular control over hardware, security, and latency—critical aspects for AI/LLM workloads. Furthermore, it can contribute to optimizing the Total Cost of Ownership (TCO) in the long run, reducing reliance on third-party cloud services and their associated variable operational costs. For those evaluating on-premise deployments, AI-RADAR offers analytical frameworks on /llm-onpremise to assess the trade-offs between different architectures and implementation strategies, providing neutral guidance on concrete hardware specifications and infrastructure requirements.
💬 Comments (0)
🔒 Log in or register to comment on articles.
No comments yet. Be the first to comment!