FlashLM v4: An Efficient Ternary Language Model
FlashLM v4 represents a step forward in the development of small language models. This model, with only 4.3 million parameters and ternary weights (-1, 0, +1), was trained on a CPU in just two hours, without the aid of GPUs.
The model is capable of generating coherent children's stories, with dialogues and narrative structure. This result was achieved thanks to an optimized architecture and a targeted training dataset, TinyStories.
Technical Details
- Parameters: 4.3 million (ternary)
- Hardware: 2-thread CPU
- Training time: 2 hours
- Dataset: TinyStories
- Architecture: Gated conv + GLU (no attention)
- Vocabulary: 10K
FlashLM v4 uses ternary quantization with a straight-through estimator. During inference, the core operations are simple additions, subtractions, and zeros.
Comparison with TinyStories-1M
FlashLM v4 was compared to TinyStories-1M, a similarly sized model trained on a GPU. Although FlashLM v4 is still behind in terms of BPC (bits-per-character), it has only seen a small fraction of TinyStories-1M's training data. This suggests that FlashLM v4 still has room for improvement with more extensive training.
Next Steps
The development team plans to train a larger version of FlashLM v4 on more powerful hardware, with the goal of closing the performance gap with TinyStories-1M. The release of the training code is also planned to allow anyone to reproduce the results on their own hardware.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!