M3Kang: A new benchmark for multilingual mathematical reasoning
M3Kang, a dataset designed to evaluate the multimodal mathematical reasoning capabilities of vision-language models (VLMs) in a multilingual context, has been released. This dataset aims to bridge the gap between VLM and human performance in mathematical reasoning, especially when considering different languages and modalities.
Dataset details
M3Kang is derived from the Kangaroo Math Competition, an international mathematics competition that annually involves over six million students in more than 90 countries. The dataset includes 1,747 multiple-choice problems, organized by grade level and translated into 108 languages. Some problems include diagrams essential for their solution.
Benchmarks and results
The dataset was used to benchmark both open source and proprietary VLM models. The results indicate that models still struggle with basic math and diagram-based reasoning. Performance improves with language presence and model size, but not necessarily with grade level. The analysis also includes performance data from over 68,000 students, allowing for a direct comparison with human capabilities. M3Kang, including the M2Kang subset (English only), is released in open source, along with the framework and code used for its construction.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!