ProvenanceGuard: Using Provenance to Align LLM Agents
A new study proposes a provenance-based framework to detect misalignment in LLM agents, dramatically reducing false negatives and unnecessary interventions. Tests on Agent-SafetyBench and WorkBench show error rates dropping from 42.9% to 1.8% and int...