## WearVox: Evaluating the Voice Assistants of the Future Wearable devices, such as AI glasses, are transforming voice assistants into always-available, hands-free collaborators. However, integrating into daily life poses new challenges, such as managing egocentric audio affected by motion and noise, rapid micro-interactions, and the need to distinguish voice commands from background noise. To bridge this gap, WearVox has been presented, the first benchmark designed to rigorously evaluate voice assistants in realistic wearable scenarios. WearVox comprises 3,842 multi-channel audio recordings collected via AI glasses across five diverse tasks: Search-Grounded QA, Closed-Book QA, Side-Talk Rejection, Tool Calling, and Speech Translation. The recordings cover a wide range of indoor and outdoor environments and diverse acoustic conditions. ## Results and Implications Initial tests on speech Large Language Models (SLLMs), both proprietary and open-source, showed that most real-time models achieve accuracies ranging from 29% to 59% on WearVox. Performance degrades significantly in noisy outdoor environments, underscoring the difficulty of the benchmark. A study demonstrated that using multi-channel audio inputs significantly enhances the model's robustness to environmental noise and the ability to distinguish between direct commands and background conversations. These results highlight the critical importance of spatial audio cues for context-aware voice assistants and establish WearVox as a comprehensive testbed for advancing voice AI research applied to wearable devices.