FlashLM v5 "Thunderbolt": CPU Training Beats GPU
FlashLM v5 "Thunderbolt" represents a significant evolution in the FlashLM series, demonstrating that competitive results can be achieved in language model training even using a CPU.
Results
The model achieved a final perplexity of 1.36 and a BPC (bits per character) of 0.44. Training was performed on an AMD Ryzen 7950X3D CPU in approximately 40 hours. The model has 29.7 million parameters, of which 26.5 million are ternary.
Architecture
FlashLM v5 uses the ParallelGatedRecurrence architecture, characterized by:
- BitLinear with ternary weights {-1, 0, +1}
- Parallel gated recurrence with learned decay gates
- No matrix multiplications in the forward pass
Comparison with previous versions
The v5 "Thunderbolt" version shows a marked improvement over previous versions (v4 "Bolt" and v5.2 "Nova-Ignition") in terms of perplexity, BPC, and quality of the generated output. In particular, v5 demonstrates better narrative coherence, greater vocabulary diversity, and more correct grammar.
Future directions
The FlashLM project will continue with the v6 series, focusing on validating the ParallelGatedRecurrence architecture. In addition, a new project (Nano-Coder) will be launched to apply FlashLM techniques to code generation.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!