Reasoning over Precedents Alongside Statutes: Case-Augmented Deliberative Alignment for LLM Safety
New research addresses the challenge of ensuring that Large Language Models (LLMs) adhere to safety principles without refusing benign requests. The study evaluates the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases, proposing a case-augmented deliberative alignment method (CADA) to enhance the safety and robustness of LLMs.