FlashLM v5: Language Model Trained on CPU Beats GPU Baseline
FlashLM v5, a language model with 29.7 million parameters, was trained on an AMD Ryzen 7950X3D CPU in approximately 40 hours. The model achieved a perplexity of 1.36, surpassing the TinyStories-1M baseline (PPL 1.59). The ParallelGatedRecurrence arch...