Context Management for Deep AI Agents: Techniques and Evaluations

Effective context management has become essential for AI agents as the length of addressable tasks increases, in order to prevent context rot and manage the memory limitations of LLMs.

LangChain's Deep Agents SDK provides an open source framework for developing agents capable of planning, spawning sub-agents, and interacting with a filesystem to execute complex, long-running tasks. Because these tasks can exceed the model's context window, the SDK implements features for context compression.

Context Compression Techniques

Context compression refers to techniques that reduce the volume of information in an agent's working memory while preserving the details relevant to completing the task. This may include summarizing previous interactions, filtering out stale information, or strategically deciding what to retain and what to discard.

Deep Agents implements a filesystem abstraction that allows agents to perform operations such as listing, reading, and writing files, as well as searching, pattern matching, and file execution. Agents use the filesystem to search and retrieve offloaded content as needed.

Deep Agents implements three main compression techniques, triggered at different frequencies:

Offloading large tool results: Responses from large tools are offloaded to the filesystem.
Offloading large tool inputs: When the context size crosses a threshold, old write/edit arguments from tool calls are offloaded to the filesystem.
Summarization: When the context size crosses the threshold and there is no more context available for offloading, a summarization step is performed to compress the message history.

To manage context limits, the Deep Agents SDK triggers these compression steps at threshold fractions of the model's context window size.

Offloading Large Tool Results

Responses from tool invocations (e.g., the result of reading a large file or an API call) can exceed a model's context window. When Deep Agents detects a tool response exceeding 20,000 tokens, it offloads the response to the filesystem and substitutes it with a file path reference and a preview of the first 10 lines. Agents can then re-read or search the content as needed.

Offloading Large Tool Inputs

File write and edit operations leave behind tool calls containing the complete file content in the agent's conversation history. Since this content is already persisted to the filesystem, it's often redundant. As the session context crosses 85% of the model's available window, Deep Agents will truncate older tool calls, replacing them with a pointer to the file on disk and reducing the size of the active context.

Summarization

When offloading no longer yields sufficient space, Deep Agents falls back to summarization. This process has two components:

In-context summary: an LLM generates a structured summary of the conversation—including session intent, artifacts created, and next steps—which replaces the full conversation history in the agent's working memory.
Filesystem preservation: The complete, original conversation messages are written to the filesystem as a canonical record.

This dual approach ensures the agent maintains awareness of its goals and progress (via the summary) while preserving the ability to recover specific details when needed (via filesystem search).

Evaluating Compression Strategies

When evaluating your own context compression strategies, it's important to:

Start with real-world benchmarks, then stress-test individual features.
Test recoverability.
Monitor for goal drift.

All features of the Deep Agents agent are open source.

Context Management for Deep AI Agents: Techniques and Evaluations

Context Compression Techniques

Offloading Large Tool Results

Offloading Large Tool Inputs

Summarization

Evaluating Compression Strategies

💬 Commenti (0)

📚 Approfondimenti

Approfondisci su LLM On-Premise

Gestione del contesto per agenti AI con Deep Agents

Deep Agents: Sviluppare applicazioni multi-agente con IA avanzata

Anthropic risolve il problema dell'agente AI con una nuova soluzione multi-sessione