Autonomous context compression for LLM agents

Autonomous context compression

Langchain has added an autonomous context compression feature to the Deep Agents SDK (Python) and CLI. This allows models to compress their own context windows at opportune times.

Context compression reduces the information in an agent's working memory. Older messages are replaced by a summary or condensed representation of an agent's progress, preserving what's relevant to a task. This action is often necessary to accommodate finite context windows and reduce context rot.

Agents often control this by compacting at a fixed token threshold. This design is suboptimal because there are good times and bad times to compact: it is not ideal to compact when you're in the middle of a complex refactor; it is better to compact when you start a new task or believe that prior context will lose relevance.

Many interactive coding tools feature a /compact command or similar, which allows users to manually trigger a context compression step at opportune times. This new feature exposes a tool to the agent that lets it trigger context compression itself. This enables more opportunistic compaction without requiring your application's users to be aware of finite context windows or issue specific commands.

This tool is currently enabled in Deep Agents CLI and opt-in in the Deep Agents SDK.

When should we compact?

There are several situations that could warrant a context compression action:

At clean task boundaries:
- A user signals that they are moving on to a new task for which earlier context is likely irrelevant
- The agent has finished a deliverable and the user acknowledges task completion
After extracting a result from a large amount of context:
- The agent has obtained a fact, conclusion, summary, or other result by consuming a significant amount of context, as in a research task
Before consuming a large amount of new context:
- The agent is about to generate a long draft
- The agent is about to read a large amount of new context
Before entering a complex multi-step process:
- The agent is about to start a lengthy refactor, migration, multi-file edit, or incident response
- The agent has produced a plan and is about to begin executing the steps
A decision has been made that supersedes prior context:
- New requirements have come to light that invalidate previous context
- There are many tangents or dead-ends that can be reduced to a summary

What happens when the tool is called?

The tool is parameterized the same as the existing Deep Agents summarization middleware: we retain recent messages (10% of available context) and summarize what comes before. Recent messages, including the call to the compaction tool and associated response, are retained in the recent context.

How to use

The tool is implemented as a separate middleware, so you can enable it by adding it to the middleware list in create_deep_agent.

In the CLI, simply call /compact when you're ready to trim context or move onto a new task.

Our experience with this feature

We tested:

A custom evaluation suite, in which we used LangSmith traces to inject follow-up prompts to threads that do and do not warrant compaction;
Terminal-bench-2, in which we did not observe any instances of autonomous compaction;
Our own coding tasks in Deep Agents CLI.

In practice agents are conservative about triggering compaction, but when they do they tend to choose moments where it clearly improves the workflow.

Autonomous context compression is a small feature, but it points at a broader direction for agent design: giving models more control over their own working memory and fewer rigid, hand-tuned rules in the agent.