Unsloth has introduced a new feature that speeds up embedding finetuning in language models. According to reports, the finetuning is 1.8 to 3.3 times faster, with 20% less VRAM consumption compared to FA2 setups. ## Technical Details Unsloth's new implementation allows for handling larger contexts without sacrificing accuracy. 3GB of VRAM is sufficient for running 4-bit QLoRA, while 16-bit LoRA requires 6GB. Full finetuning, LoRA (16 bit) and QLoRA (4 bit) are all faster. Fine-tuning embedding models can improve retrieval & RAG by aligning vectors to your domain-specific notion of similarity, improving search, clustering, and recommendations on your data. ## Supported Models Unsloth natively supports several models, including ModernBERT, Qwen Embedding, Embedding Gemma, MiniLM-L6-v2, mpnet, and BGE. Other models are supported automatically. After finetuning, you can deploy your fine-tuned model anywhere: transformers, LangChain, Ollama, vLLM, llama.cpp. ## Getting Started To get started, you can try the EmbeddingGemma notebook in a free Colab T4 instance. To get the latest updates, you need to upgrade Unsloth via `pip install --upgrade unsloth unsloth_zoo`.