Detecting Pretraining Data in LLMs: A New Frontier

The increasing opacity of pretraining corpora used in Large Language Models (LLMs) raises significant privacy and copyright concerns. Consequently, detecting pretraining data has become a crucial challenge.

Gap-K%: An Innovative Approach

A recent study published on arXiv introduces Gap-K%, a new method for detecting pretraining data based on analyzing the optimization dynamics of LLM pretraining. This approach focuses on discrepancies between the model's top-1 prediction and the target token, leveraging the fact that such discrepancies generate strong gradient signals that are penalized during training.

Gap-K% utilizes the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Experimental results on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% consistently outperforms prior methods in terms of performance, showing superior results across various model sizes and input lengths.