Prompting Strategies for LLMs and Chart Analysis

The performance of large language models (LLMs) is strongly influenced by the prompting strategies used. A recent study focused on analyzing different prompting techniques applied to question answering (QA) based on charts, an area where the model's reasoning ability is crucial.

Evaluation Methodology

The research evaluated four widely used prompting paradigms: Zero-Shot, Few-Shot, Zero-Shot Chain-of-Thought, and Few-Shot Chain-of-Thought. The models examined were GPT-3.5, GPT-4, and GPT-4o, tested on the ChartQA dataset. The analysis focused exclusively on structured chart data, isolating the prompt structure as the only experimental variable. The evaluation metrics used were Accuracy and Exact Match.

Key Findings

The results, obtained from 1,200 diverse ChartQA samples, indicate that Few-Shot Chain-of-Thought prompting consistently yields the highest accuracy (up to 78.2%), particularly for questions requiring more complex reasoning. Few-Shot prompting improves adherence to the required format. Zero-Shot shows good performance only with high-capacity models and on simpler tasks. These results provide useful guidance for selecting prompting strategies in reasoning tasks on structured data, with implications for both efficiency and accuracy in real-world applications.