Voice AI Systems: New Vulnerabilities to Hidden Audio Attacks

Voice AI Systems Under Attack: The Threat of "AudioHijack"

The integration of AI-powered voice and audio tools into daily life is now pervasive, from digital assistants to smart speakers and customer service bots. Advances in Large Audio-Language Models (LALMs), capable of both analyzing and generating audio, have opened new frontiers, allowing device control via voice commands, automatic meeting transcription, or song identification. These models are increasingly equipped with the ability to interact with external services and operate other applications and tools.

However, new research reveals a critical vulnerability: these tools can be "hijacked" through imperceptible sounds embedded in audio, forcing them to execute unauthorized commands without the user's knowledge. The study, to be presented at the upcoming IEEE Symposium on Security and Privacy in San Francisco, demonstrates how a modified audio clip, undetectable by human ears, can manipulate a model's behavior with an average success rate ranging from 79 to 96 percent. These clips are designed to work regardless of the instructions provided by the user, making them reusable for multiple attacks on the same model.

How Adversarial Audio Attacks Work

The research builds on years of work into "adversarial audio examples"—audio manipulated to deceive machine learning models. While previous work focused primarily on inducing incorrect predictions in models performing one-way tasks (like speech recognition or audio classification), this new study stands out for its focus on generative models, capable of producing responses and taking actions. The technique, dubbed AudioHijack, exploits a critical security flaw in LALM design: because these models can receive instructions in audio format, malicious instructions can be hidden in manipulated clips to elicit a wide range of undesirable behaviors.

Unlike many previous attacks on generative models, which required the attacker to have complete control over both the final audio input and original instructions given to the model (essentially acting as the user), AudioHijack manipulates only the audio data being processed by the model. This makes it possible to attack a model while it's being used by someone else. Real-world examples include hiding malicious instructions in online videos, music clips, or voice notes that users query an AI about, or broadcasting malicious audio on a Zoom call that is then uploaded to AI transcription services. The research team has also demonstrated the ability to inject malicious audio into a live voice chat with an AI in real time.

Implications for Security and On-Premise Deployments

The authors tested the approach against 13 leading open models, including commercial AI voice services from Microsoft and Mistral. The results showed the ability to coax models into conducting sensitive web searches, downloading files from attacker-controlled sources, and sending emails containing user data. Meng Chen, lead author and Ph.D. student at Zhejiang University in China, emphasizes that "it takes just half an hour to train this signal and then, because this signal is context-agnostic, you can use it to attack the target model whenever you want, no matter what the user says."

For organizations evaluating on-premise or hybrid deployments of LLMs and LALMs, this vulnerability raises serious concerns regarding data sovereignty and compliance. The ability of an attack to circumvent traditional defenses, such as providing models with examples of malicious instructions (which reduced attack success by only 7%) or asking the model to reflect on whether its response matched the user's instructions (which caught only 28% of attacks), indicates a fundamental gap. The only effective tactic identified is monitoring the models' internal attention mechanisms, although attackers can mitigate this defense by dialing back attention manipulation. This scenario highlights the need for robust security strategies at both the infrastructural and model levels, especially in air-gapped environments or those with stringent privacy requirements.

Future Prospects and Unsolved Challenges

Attacking proprietary closed models, such as those from OpenAI and Anthropic, is much harder due to limited public information about their architectures. However, these models often use open-source components, such as pre-trained audio encoders, that could be targeted similarly, an area the team is currently investigating. To make the manipulations harder for a human listener to detect, the researchers used a previously developed technique that makes changes to the audio sound like natural reverberation, which is harder to distinguish than adding noise.

Eugene Bagdasarian, an assistant professor of computer science at The University of Massachusetts Amherst, notes that, in the real world, this kind of audio attack will face additional challenges such as compression and various post-processing mechanisms that could degrade signals. However, he reiterates that multi-modal attacks on AI models remain an essentially unsolved problem. "With text data we can understand that something is wrong (special characters, suspicious sentences, etc.), audio modality is really challenging to comprehend because of how limited our hearing is," he wrote in an email. This underscores the urgency for CTOs and infrastructure architects to consider these new attack vectors when designing their AI deployments, balancing performance, TCO, and security.