AI-RADAR.IT · AI-RADAR.NET · AI-RADAR.TECH

News & analysis on local LLMs, stack & on-prem hardware.

📁 LLM AI generated

FlashLM: Language Model Trained on CPU in Just Over an Hour

Published on 2026-02-18 06:21 ℹ️ LocalLLaMA 📰 Read the original source article →

🏷️ Hardware 🏷️ LLM On-Premise 🏷️ DevOps

FlashLM: modello linguistico addestrato su CPU in poco più di un'ora

A developer recently shared the results of their experiment with tiny language models, called FlashLM, designed to be trained and run entirely on CPU.

Model Details

The FlashLM v3-13m model has the following characteristics:

13.6M parameters, with a d_model size of 256.
Ternary weights ({-1, 0, +1}), meaning inference only requires additions and subtractions, no multiplications.
Trained on 2-thread CPU, no GPU, in 1.2 hours.
Trained on 32M tokens from FineWeb-Edu.
Validation loss: 6.80.
Uses frozen GPT-2 embeddings (SVD projected) so it doesn't waste training time learning an embedding table.

Performance and Bottlenecks

The model produces grammatical English but lacks semantic coherence. The biggest surprise was that 86% of training time was spent on the output layer, projecting 256 dims to a 50,257 token vocabulary. This bottleneck limited the effectiveness of training the model core.

The developer is working on a next version (v4) that replaces the softmax with a hierarchical tree structure to fix this issue. If successful, this could allow for 5-10x more effective training in the same amount of time.

For those evaluating on-premise deployments, there are trade-offs related to optimizing models for CPUs versus GPUs, which AI-RADAR analyzes in detail in the /llm-onpremise section.

AI-Radar Takeaway

A developer trained a small language model, called FlashLM, entirely on CPU in 1.2 hours, without matrix multiplications. The 13.6M parameter model uses ternary weights and achieved a validation loss of 6.80. 86% of the training time was spent on the output layer, highlighting a bottleneck that the next version will attempt to address.

🤖 Ask AI about this

Want to dive deeper? Read the full article from the source:

📖 READ THE ORIGINAL ARTICLE

💻 Need GPU Cloud Infrastructure?

For running LLM inference, training models, or testing hardware configurations, check out this platform:

PeerPush AI Community Platform

Discover and share AI tools and projects. Connect with developers, get feedback, and grow your AI startup in a vibrant community of innovators.

✓ AI Community ✓ Project Showcase ✓ Developer Network

🔗 This is an affiliate link - we may earn a commission at no extra cost to you.

AI-RADAR NEWSLETTER

Stay ahead — get AI signals in your inbox

Daily or weekly digest of the most important AI news. 160+ readers, no spam.

💬 Comments (0)

🔒 Log in or register to comment on articles.

No comments yet. Be the first to comment!

🔍 Continue Exploring

Explore LLM On-Premise

Complete guide to running AI models locally: hardware, stack, and privacy.

FlashLM v4: 4.3M ternary model trained on CPU in 2 hours

FlashLM v4: 4.3M ternary model trained on CPU in 2 hours

FlashLM v4 is a language model with 4.3 million parameters, ternary weights (-1, 0, +1), and CPU-based training in just two hours. It generates coherent stories

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

Large Language Models Predict Functional Outcomes after Acute Ischemic Stroke

Researchers evaluated the ability of LLMs (BERT, NYUTron, Llama-3.1-8B, MedGemma-4B) to predict the modified Rankin Scale (mRS) after acute ischemic stroke. Fin

Unsloth Releases GLM-4.7-Flash in GGUF Format

Unsloth Releases GLM-4.7-Flash in GGUF Format

Unsloth has released the GLM-4.7-Flash language model in GGUF (GPT-Generated Unified Format). This format facilitates the use of the model on various hardware p

Alibaba's Qwen3.5-397B: #3 open-weights model globally

Alibaba's Qwen3.5-397B: #3 open-weights model globally

Alibaba's Qwen3.5-397B large language model (LLM) has achieved the third position in the open-source model rankings, according to the Artificial Analysis Intell

FlashLM v5: Language Model Trained on CPU Beats GPU Baseline

FlashLM v5: Language Model Trained on CPU Beats GPU Baseline

FlashLM v5, a language model with 29.7 million parameters, was trained on an AMD Ryzen 7950X3D CPU in approximately 40 hours. The model achieved a perplexity of

More in LLM

Google's TabFM: zero-shot tabular predictions without training

Longcat 2: INT8 and FP8 quantization now available for on-prem deployment

Why AI Needs a Glossary (and What It Has to Do with On-Premise Deployment)

Smartschool and AI for admission tests: why teaching is harder than answering

Mistral releases Leanstral 1.5: formal verification with 6 billion active parameters

DeepSeek Unveils DSpark: A Speed Leap for LLM Inference

→ View all in LLM →

AI-Radar LLM On-Premise

Complete guide to running AI models locally: hardware, stack, privacy, and reference architectures.

👥 Join 160+ AI explorers

A free community of developers, engineers and AI enthusiasts following local AI daily.

Register free → Already a member? Log in