Bias in Benchmarks for Autonomous Driving
Multiple Choice Question Answering (MCQA) benchmarks are widely used to evaluate the performance of Vision Language Models (VLMs) in autonomous driving scenarios. However, a recent study highlights how these benchmarks are susceptible to hidden textual biases, which allow models to exploit linguistic patterns rather than understanding the visual context.
Bias Reduction with a New Method
The research proposes a method to significantly reduce this problem. The results show that a VLM fine-tuned on synthetic data can achieve accuracy comparable to that obtained on human-validated benchmarks, even without visual input. The proposed method reduces accuracy based on textual shortcuts from +66.9% to +2.9%, eliminating most linguistic exploits.
Curriculum Learning and Visual Grounding
By decoupling the correct answer from linguistic artifacts and employing a curriculum learning strategy, the model is forced to rely on visual grounding. This ensures that performance accurately reflects perceptual understanding, improving the reliability of VLMs in autonomous driving applications.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!