Compression and Reasoning in Language Models

A recent study explored how prompt compression affects the performance of large language models (LLMs) in various tasks. The research focused on code generation and reasoning, revealing surprising results.

The Perplexity Paradox

The researchers discovered a phenomenon called the "perplexity paradox." In code generation tasks, models tolerate aggressive prompt compression (up to 60%). Conversely, in reasoning tasks, such as solving mathematical problems, performance degrades gradually with compression. Per-token analysis revealed that tokens related to code syntax are preserved (high perplexity), while numerical values in math problems are discarded, despite being crucial for the task (low perplexity).

Signature Injection and TAAC

To mitigate this issue, a technique called "signature injection" was introduced, which significantly improved the pass rate in mathematical tasks (from 5.3% to 39.3%). Furthermore, an adaptive compression algorithm called TAAC (Task-Aware Adaptive Compression) was proposed, allowing for a 22% cost reduction while maintaining 96% quality, outperforming fixed-ratio compression by 7%.

Validation on Various Benchmarks

The study validated the results on several code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming that the compression threshold generalizes across languages and difficulty levels.