Bias in Benchmarks for Autonomous Driving
Multiple Choice Question Answering (MCQA) benchmarks are widely used to evaluate the performance of Vision Language Models (VLMs) in autonomous driving scenarios. However, a recent study highlights how these benchmarks are susceptible to hidden textual biases, which allow models to exploit linguistic patterns rather than understanding the visual context.
Bias Reduction with a New Method
The research proposes a method to significantly reduce this problem. The results show that a VLM fine-tuned on synthetic data can achieve accuracy comparable to that obtained on human-validated benchmarks, even without visual input. The proposed method reduces accuracy based on textual shortcuts from +66.9% to +2.9%, eliminating most linguistic exploits.
Curriculum Learning and Visual Grounding
By decoupling the correct answer from linguistic artifacts and employing a curriculum learning strategy, the model is forced to rely on visual grounding. This ensures that performance accurately reflects perceptual understanding, improving the reliability of VLMs in autonomous driving applications.
💬 Comments (0)
🔒 Log in or register to comment on articles.
No comments yet. Be the first to comment!