A developer shared their experience building an AI agent with system shell access, revealing potential security vulnerabilities.

The Incident

The goal was to create a tool to allow models like Claude or GPT-4 to navigate within the codebase. After providing bash access, the AI agent was tasked with checking imports and creating ASCII art from an environment file. The agent performed both tasks but inadvertently printed the API keys as part of the artistic output.

Prompt Injection and Sandboxing

The incident led the developer to investigate the issue of prompt injection further, discovering that it is a more serious threat than initially thought. Resources such as Anthropic's page and an article from CodeAnt testing bypasses were consulted. Simon Willison has raised similar concerns for several months.

The discussion shifted to possible sandboxing solutions. Docker, with its shared kernel, seems insufficient. gVisor adds overhead, while Firecracker, used by AWS Lambda, might be a more robust solution, albeit complex. The developer faces the choice between releasing the system with minimal protections or investing two weeks to implement proper isolation.