Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving

Bias in Benchmarks for Autonomous Driving

Multiple Choice Question Answering (MCQA) benchmarks are widely used to evaluate the performance of Vision Language Models (VLMs) in autonomous driving scenarios. However, a recent study highlights how these benchmarks are susceptible to hidden textual biases, which allow models to exploit linguistic patterns rather than understanding the visual context.

Bias Reduction with a New Method

The research proposes a method to significantly reduce this problem. The results show that a VLM fine-tuned on synthetic data can achieve accuracy comparable to that obtained on human-validated benchmarks, even without visual input. The proposed method reduces accuracy based on textual shortcuts from +66.9% to +2.9%, eliminating most linguistic exploits.

Curriculum Learning and Visual Grounding

By decoupling the correct answer from linguistic artifacts and employing a curriculum learning strategy, the model is forced to rely on visual grounding. This ensures that performance accurately reflects perceptual understanding, improving the reliability of VLMs in autonomous driving applications.

Reducing Text Bias in Synthetically Generated MCQAs for VLMs in Autonomous Driving

Bias in Benchmarks for Autonomous Driving

Bias Reduction with a New Method

Curriculum Learning and Visual Grounding

💻 Need GPU Cloud Infrastructure?

💬 Comments (0)

🔍 Continue Exploring

Explore LLM On-Premise

Bias and LLMs: Data Injection for More Efficient Models

LLM Benchmark: Logical Reasoning and the 'Car Wash' Test

Qwen 3.5 struggles on Vending-Bench 2: results analysis

👥 Join 160+ AI explorers