UG student launches Dhi-5B, LLM trained from scratch on a budget

Dhi-5B: A 5 Billion Parameter LLM Developed with Limited Resources

An undergraduate student has announced the release of Dhi-5B, a 5 billion parameter multimodal language model (LLM). The unique aspect of this project is the extremely low budget used for training: approximately $1200.

The model was developed using a custom codebase and state-of-the-art training methodologies. The training process was divided into five main stages:

Pre-Training: The most computationally intensive phase, dedicated to building the core of the model.
Context-Length-Extension: The model learns to handle 16,000 token contexts, starting from the 4,000 learned during pre-training.
Mid-Training: Annealing on very high quality datasets.
Supervised-Fine-Tuning: The model is fine-tuned to handle conversations.
Vision-Extension: The model acquires the ability to process visual information.

The model will be released in three phases: Dhi-5B-Base (already available), Dhi-5B-Instruct (coming soon), and the full Dhi-5B version (coming soon).

The base version of the model has 4 billion parameters and was trained on 40 billion natural language tokens, mainly in English, from the FineWeb-Edu dataset. The new Muon optimizer was used for optimizing the Matrix Layers, while the rest was optimized with AdamW. The model architecture includes 32 layers, a width of 3072, SwiGLU MLPs, full MHA attention with FlashAttention-3, a context length of 4096, a vocabulary of 64,000 tokens, and a batch size of 2 million during training.

For those evaluating on-premise deployments, there are trade-offs to consider. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these options.

UG student launches Dhi-5B, LLM trained from scratch on a budget

Dhi-5B: A 5 Billion Parameter LLM Developed with Limited Resources

💻 Need GPU Cloud Infrastructure?

💬 Commenti (0)

📚 Approfondimenti

Approfondisci su LLM On-Premise

Anthropic punta a una valutazione di 300 miliardi con una raccolta fondi da 20

Preply, piattaforma ucraina per l'apprendimento linguistico, vale 1,2 miliardi

Anthropic punta a un round di finanziamento da 20 miliardi