Traversal-as-Policy: A New Approach for LLM Agents

Managing safety and efficiency in autonomous agents based on LLMs presents a complex challenge. A new study introduces "Traversal-as-Policy," a method that uses Gated Behavior Trees (GBT) to control the behavior of these agents.

How it works

The approach involves extracting execution logs from sandbox environments (OpenHands) and distilling them into a single executable GBT. Each node in the tree represents a state-conditioned action macro, derived from successful trajectories. Trajectories considered unsafe are blocked via pre-execution "gates," updated based on experience to prevent the re-admission of dangerous contexts.

Results

Tests on various benchmarks (software, web, reasoning, security) demonstrate that GBT improves the success rate, reduces violations, and decreases costs. For example, on SWE-bench Verified (Protocol A, 500 issues), GBT-SE increases success from 34.6% to 73.6%, reduces violations from 2.8% to 0.2%, and cuts token/character usage from 208k/820k to 126k/490k. With the same distilled tree, 8B executors more than double success on SWE-bench Verified (from 14.0% to 58.8%) and WebArena (from 9.1% to 37.3%).

Implications

This approach offers a way to externalize and verify the policies of LLM agents, improving safety and efficiency. The ability to reduce computational costs and increase success opens new perspectives for the use of autonomous agents in complex environments.