RedacBench: A New Benchmark for Sensitive Information Redaction
The ability of modern language models to extract sensitive information from unstructured text makes the selective removal of such information, or redaction, a crucial aspect of data security. To address the limitations of existing benchmarks, which often focus on predefined categories such as personally identifiable information (PII), RedacBench has been introduced.
RedacBench is a comprehensive benchmark for evaluating the removal of information based on specific policies, across different domains and strategies. Constructed from 514 texts created by individuals, companies, and government entities, and paired with 187 security policies, RedacBench measures a model's ability to selectively remove information that violates policies while preserving the original semantics.
Performance is quantified using 8,053 annotated propositions that capture all inferable information in each text. This allows for the evaluation of both security (the removal of sensitive propositions) and utility (the preservation of non-sensitive propositions). Experiments conducted on various removal strategies and state-of-the-art language models show that, while more advanced models can improve security, preserving utility remains a challenge. RedacBench is publicly available to foster future research.
For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these options.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!