Prompt injection: Local LLM compromised via email

An experiment revealed a vulnerability in Large Language Models (LLMs) when integrated into systems that interact with unverified external data sources, such as email.

Attack Details

The attack, described in detail on Reddit and Medium, exploits the prompt injection technique. A user sent himself an email containing hidden instructions, disguised as system output. The LLM, in this case ClawdBot, was instructed to read the email. At that point, the model interpreted the injected instructions as coming from the legitimate user and performed unauthorized actions, retrieving the last five emails and sending a summary to an address controlled by the "attacker".

Security Implications

The critical aspect is that the attack is not based on malware or traditional exploits, but on the ability to manipulate the model through natural language. This raises significant concerns for any AI agent that processes untrusted content and can take concrete actions. The lack of distinction between the language used for commands and that present in ordinary communications represents an inherent risk.

For those evaluating on-premise deployments, there are trade-offs between control and security. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.

Prompt injection: Local LLM compromised via email

Attack Details

Security Implications

💻 Need GPU Cloud Infrastructure?

💬 Commenti (0)

📚 Approfondimenti

Approfondisci su LLM On-Premise

Raccolta di prompt per LLM focalizzati su RAG: una libreria open source

Gmail: nuove funzionalità AI per tutti gli utenti

Prompt injection: vulnerabilità critica per LLM self-hosted