AI-RADAR.IT · AI-RADAR.NET · AI-RADAR.TECH

News & analysis on local LLMs, stack & on-prem hardware.

📁 LLM AI generated

FlashLM v4: 4.3M ternary model trained on CPU in 2 hours

Published on 2026-02-18 22:21 ℹ️ LocalLLaMA 📰 Read the original source article →

🏷️ Hardware 🏷️ Fine-Tuning

FlashLM v4: modello ternario da 4.3M addestrato su CPU in 2 ore

FlashLM v4: An Efficient Ternary Language Model

FlashLM v4 represents a step forward in the development of small language models. This model, with only 4.3 million parameters and ternary weights (-1, 0, +1), was trained on a CPU in just two hours, without the aid of GPUs.

The model is capable of generating coherent children's stories, with dialogues and narrative structure. This result was achieved thanks to an optimized architecture and a targeted training dataset, TinyStories.

Technical Details

Parameters: 4.3 million (ternary)
Hardware: 2-thread CPU
Training time: 2 hours
Dataset: TinyStories
Architecture: Gated conv + GLU (no attention)
Vocabulary: 10K

FlashLM v4 uses ternary quantization with a straight-through estimator. During inference, the core operations are simple additions, subtractions, and zeros.

Comparison with TinyStories-1M

FlashLM v4 was compared to TinyStories-1M, a similarly sized model trained on a GPU. Although FlashLM v4 is still behind in terms of BPC (bits-per-character), it has only seen a small fraction of TinyStories-1M's training data. This suggests that FlashLM v4 still has room for improvement with more extensive training.

Next Steps

The development team plans to train a larger version of FlashLM v4 on more powerful hardware, with the goal of closing the performance gap with TinyStories-1M. The release of the training code is also planned to allow anyone to reproduce the results on their own hardware.

AI-Radar Takeaway

FlashLM v4 is a language model with 4.3 million parameters, ternary weights (-1, 0, +1), and CPU-based training in just two hours. It generates coherent stories, demonstrating that small models can achieve interesting results with efficient training and an optimized architecture. The model was evaluated using BPC (bits-per-character) for a fair comparison.

🤖 Ask AI about this

Want to dive deeper? Read the full article from the source:

📖 READ THE ORIGINAL ARTICLE

💻 Need GPU Cloud Infrastructure?

For running LLM inference, training models, or testing hardware configurations, check out this platform:

Vast.ai GPU Marketplace

Decentralized GPU marketplace with ultra-competitive pricing. Rent from a global network of providers. Perfect for experimentation, development, and cost-optimized workloads.

✓ Lowest prices ✓ Global network ✓ Flexible options

🔗 This is an affiliate link - we may earn a commission at no extra cost to you.

AI-RADAR NEWSLETTER

Stay ahead — get AI signals in your inbox

Daily or weekly digest of the most important AI news. 160+ readers, no spam.

💬 Comments (0)

🔒 Log in or register to comment on articles.

No comments yet. Be the first to comment!

🔍 Continue Exploring

Explore LLM On-Premise

Complete guide to running AI models locally: hardware, stack, and privacy.

FlashLM: Language Model Trained on CPU in Just Over an Hour

FlashLM: Language Model Trained on CPU in Just Over an Hour

A developer trained a small language model, called FlashLM, entirely on CPU in 1.2 hours, without matrix multiplications. The 13.6M parameter model uses ternary

FlashLM v5: Language Model Trained on CPU Beats GPU Baseline

FlashLM v5: Language Model Trained on CPU Beats GPU Baseline

FlashLM v5, a language model with 29.7 million parameters, was trained on an AMD Ryzen 7950X3D CPU in approximately 40 hours. The model achieved a perplexity of

DS4: Salvatore Sanfilippo Optimizes DeepSeek V4 Flash for Local Inference

DS4: Salvatore Sanfilippo Optimizes DeepSeek V4 Flash for Local Inference

Salvatore Sanfilippo, the creator of Redis, has launched DS4, a new project on GitHub. The initiative aims to run DeepSeek V4 Flash with a 1 million token conte

Unsloth Releases GLM-4.7-Flash in GGUF Format

Unsloth Releases GLM-4.7-Flash in GGUF Format

Unsloth has released the GLM-4.7-Flash language model in GGUF (GPT-Generated Unified Format). This format facilitates the use of the model on various hardware p

Ternary LLMs: Unfulfilled Promise or Untapped Potential?

Ternary LLMs: Unfulfilled Promise or Untapped Potential?

Ternary Large Language Models (LLMs), such as BitNet, generated significant interest due to their potential to drastically reduce memory and computational requi

More in LLM

Longcat 2: INT8 and FP8 quantization now available for on-prem deployment

Why AI Needs a Glossary (and What It Has to Do with On-Premise Deployment)

Smartschool and AI for admission tests: why teaching is harder than answering

Mistral releases Leanstral 1.5: formal verification with 6 billion active parameters

DeepSeek Unveils DSpark: A Speed Leap for LLM Inference

Zuckerberg: Meta’s AI agents progressing slower than expected

→ View all in LLM →

AI-Radar LLM On-Premise

Complete guide to running AI models locally: hardware, stack, privacy, and reference architectures.

👥 Join 160+ AI explorers

A free community of developers, engineers and AI enthusiasts following local AI daily.

Register free → Already a member? Log in