DID: A Novel Approach to Diffusion Language Models

Masked Diffusion Language Models (MDLMs) have shown promise, but computational efficiency and generation flexibility remain limited by the masking paradigm. A new study introduces Deletion-Insertion Diffusion (DID) models, which reformulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in MDLMs.

Advantages of DID Models

DID models improve training and inference efficiency by eliminating two major sources of computational overhead in MDLMs: computations on non-informative tokens and computations on tokens introduced in variable-length settings. Furthermore, DIDs offer greater flexibility by natively supporting variable-length sequences without requiring fixed-length padding and integrating an intrinsic self-correction mechanism during generation, thanks to insertion that dynamically adjusts token positions.

Implementation and Results

To train DID models, a score-based approach was designed that assigns scores to token insertion operations and derives appropriate training objectives. The objectives involve subsequence counting problems, solved via a parallelized dynamic programming algorithm. Experiments conducted in fixed and variable-length settings demonstrate the advantage of DID models over MDLMs baselines and existing insertion-based language models, in terms of modeling performance, sampling quality, and training/inference speed, without any hyperparameter tuning.