An engineer has created Mini-LLM, a complete transformer language model implemented entirely from scratch.
Key Features
Mini-LLM implements the same components as Llama 3:
- RoPE (Rotary Position Embeddings) to scale to longer sequences.
- RMSNorm for faster speed and stability compared to LayerNorm.
- SwiGLU, a state-of-the-art activation function.
- Grouped Query Attention for efficient inference.
- SentencePiece BPE for tokenization with a 32K vocabulary.
Complete Pipeline
The project includes a complete pipeline:
- Custom tokenization, data processing, training, and inference.
- Memory-mapped data loading (TB-scale ready).
- Mixed precision training with gradient accumulation.
- KV caching for fast generation.
Results
- 80 million parameters trained on 361 million tokens.
- 5 hours on a single A100, final loss of approximately 3.25.
- Generates coherent text with correct grammar.
- Inference speed between 200 and 500 tokens per second.
The code is clean, well-documented, and designed for learning. Each component has detailed explanations of the "why" and not just the "how".
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!