Sweep AI has announced the release of a 1.5 billion parameter open-source model called Sweep, designed for intelligent code autocompletion.

Key Features

Sweep stands out for its ability to predict subsequent code changes based on recent edits, offering a broader context than traditional autocompletion systems. The model is small enough to run locally and reportedly outperforms models four times its size in terms of speed and accuracy.

Technical Details

The model's training involved an initial phase of SFT (Supervised Fine-Tuning) on approximately 100,000 examples from permissively licensed repositories, followed by an RL (Reinforcement Learning) phase to refine the results and correct any errors. Sweep AI found that the prompt format has a significant impact on the model's performance, with simple formats outperforming more complex ones.

Benchmarks

Sweep has been tested against other code autocompletion tools, demonstrating high accuracy, which translates into better real-world usability. The model weights are available to allow anyone to develop fast, privacy-preserving autocompletion solutions for various code editors, such as VSCode and Neovim.

General Context

Code autocompletion is an essential feature in modern development environments, allowing developers to write code more quickly and efficiently. Machine learning models, such as Sweep, are opening new frontiers in this field, offering more accurate and contextually aware predictions.