IdiomX: A New Multilingual Benchmark for Idiom Understanding in LLMs

The Challenge of Idiomatic Expressions for Large Language Models

Idiomatic expressions represent one of the most persistent and complex challenges for Natural Language Processing (NLP) and, consequently, for modern Large Language Models (LLMs). Their non-compositional nature, where the overall meaning is not simply the sum of the meanings of individual words, makes it extremely difficult for models to correctly understand and interpret these phrases. Added to this is a strong dependence on context and the complexity of aligning their meanings across different languages, a crucial aspect for multilingual systems.

Existing resources dedicated to idioms often have significant limitations. They are frequently lacking in terms of scale, contextual diversity, or multilingual coverage, which reduces their utility for training and evaluating more advanced LLMs. This gap prevents models from developing a deep and nuanced understanding of figurative language, which is essential for applications requiring true linguistic mastery.

IdiomX: A Robust Framework for Multilingual Analysis

To address these issues, IdiomX has been introduced as a large-scale multilingual benchmark specifically designed for the understanding, retrieval, and interpretation of idiomatic expressions. Its construction followed a reproducible multi-stage pipeline, combining lexical resource extraction, large-scale normalization, controlled Large Language Model enrichment, and structured validation. This methodological approach ensures the quality and consistency of the resulting dataset.

The IdiomX dataset is particularly rich, containing over 190,000 contextualized examples covering more than 12,000 idioms. A distinctive element is the presence of aligned semantic representations in English, Arabic, and French, along with labels distinguishing idiomatic from literal usage, and detailed linguistic metadata. Building on this resource, IdiomX defines a unified benchmark articulated into four main tasks: idiom detection, context-to-idiom retrieval, Arabic-to-English idiom retrieval, and idiom interpretation. This extends evaluation from simple figurative recognition to a deeper analysis that includes semantic grounding and explainable meaning retrieval.

Implications and Results for Advanced Language Models

Experiments conducted with IdiomX have revealed significant results for the advancement of language models. Contextual transformer models have demonstrated substantial improvement in idiom detection, highlighting the importance of context for this specific capability. In parallel, hybrid retrieval and reranking architectures have significantly strengthened idiom retrieval, in both monolingual and cross-lingual contexts. This suggests that sophisticated retrieval strategies are fundamental for managing the complexity of idiomatic expressions.

A further finding is that idiom interpretation can be effectively modeled as a semantic retrieval task. This introduces interpretability as a complementary benchmark dimension, crucial for organizations that require LLMs to be not only performant but also transparent and understandable. For companies evaluating LLM deployment on-premise, a model's ability to precisely handle linguistic nuances, such as idioms, is a critical factor. A benchmark like IdiomX helps validate the robustness of models that might be used in environments where data sovereignty and infrastructure control are paramount, ensuring that even self-hosted models can offer high-level linguistic performance.

Future Prospects and Framework Scalability

Overall, IdiomX stands as a scalable and valuable benchmark for the study of idiomatic language, charting a path from simple detection to semantic interpretation. Its modular framework is designed to be extensible, allowing for the addition of further languages and figurative reasoning tasks. This flexibility makes it an adaptable tool for future research and development in the field of LLMs.

The adoption of such detailed benchmarks is crucial for organizations investing in advanced AI solutions. Ensuring that Large Language Models, especially those intended for on-premise deployment or in air-gapped environments, can understand and generate natural language with all its complexities, including idioms, is essential for the success of critical enterprise applications. IdiomX provides a solid foundation for evaluating and improving these capabilities, supporting the creation of more sophisticated and reliable LLMs.