AI-RADAR.IT · AI-RADAR.NET · AI-RADAR.TECH

News & analysis on local LLMs, stack & on-prem hardware.

📁 Frameworks AI generated

nanollama: Train Llama 3 from scratch and export to GGUF

Published on 2026-02-22 21:02 ℹ️ LocalLLaMA 📰 Read the original source article →

🏷️ LLM On-Premise 🏷️ Fine-Tuning 🏷️ DevOps

NanoLLama: addestramento Llama 3 da zero e export in GGUF

NanoLLama is a framework that allows training models based on the Llama 3 architecture from scratch. Unlike fine-tuning or using LoRA techniques, NanoLLama performs complete pre-training, generating a GGUF file compatible with llama.cpp.

Key Features

Simplified Training: The entire training process, from data download to GGUF export, is executed with a single command.
Llama 3 Architecture: Supports the full Llama 3 architecture, with configurations ranging from 46 million to 7 billion parameters.
Multi-corpus Training: Uses a multi-corpus training approach, based on the SmolLM2 recipe, including FineWeb-Edu, DCLM, code, and mathematics.
Native GGUF Export: Exports directly to GGUF v3 format, without the need for conversions via HuggingFace or safetensors.
Personality Injection: Allows training a base model and a model with personality, then subtracting the weights to obtain a portable personality vector.
Go Inference Engine: Includes an inference engine developed in Go (approximately 9MB), which directly reads GGUF files, useful when the entire llama.cpp stack is not needed.

Pre-trained Models

Several models have already been trained and verified, including nano (46M), micro (87M), mini (175M), and small (338M). Training is underway for goldie (1.1B), a multilingual model.

AI-Radar Takeaway

NanoLLama, an open-source framework for training Llama 3 models from scratch, without fine-tuning or LoRA, has been released. The tool allows exporting to GGUF format compatible with llama.cpp via a single command. It includes configurations from 46M to 7B parameters, multi-corpus training, and a Go inference engine.

🤖 Ask AI about this

Want to dive deeper? Read the full article from the source:

📖 READ THE ORIGINAL ARTICLE

💻 Need GPU Cloud Infrastructure?

For running LLM inference, training models, or testing hardware configurations, check out this platform:

Railway Cloud Infrastructure

Modern cloud platform with instant deployments. Deploy from GitHub in seconds with automatic HTTPS, databases, and monitoring. Perfect for web apps, APIs, and LLM inference services.

✓ GitHub integration ✓ Auto HTTPS ✓ Simple pricing

🔗 This is an affiliate link - we may earn a commission at no extra cost to you.

AI-RADAR NEWSLETTER

Stay ahead — get AI signals in your inbox

Daily or weekly digest of the most important AI news. 160+ readers, no spam.

💬 Comments (0)

🔒 Log in or register to comment on articles.

No comments yet. Be the first to comment!

🔍 Continue Exploring

LLM Quantization Explained

How quantization cuts VRAM and cost while preserving model quality.

The Local LLM Software Stack

Runtimes, inference servers, and tooling for an on-premise deployment.

NVIDIA Nemotron-3 Nano Omni 30B: A Multimodal LLM for Local Deployment

NVIDIA Nemotron-3 Nano Omni 30B: A Multimodal LLM for Local Deployment

NVIDIA has released Nemotron-3 Nano Omni 30B, a multimodal Large Language Model capable of processing audio, image, and text inputs to generate text responses.

Qwen3.5-35B-A3B: performance close to Claude Opus with continuous verification

Qwen3.5-35B-A3B: performance close to Claude Opus with continuous verification

A Mixture of Experts model called Qwen3.5-35B-A3B, with only 3 billion active parameters, has achieved surprising performance on the SWE-bench Verified Hard ben

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity LLM for On-Premise Deployment

VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity LLM for On-Premise Deployment

VectraYX-Nano, a 42-million-parameter LLM trained in Spanish for cybersecurity with a Latin American focus, has been introduced. The model features native tool

llama.cpp Integrates Multi-Tensor Parallelism Support for Gemma4

Frameworks Jun 07

llama.cpp Integrates Multi-Tensor Parallelism Support for Gemma4

The `llama.cpp` project has introduced Multi-Tensor Parallelism (MTP) support for Gemma4 models. This development is crucial for organizations aiming to run LLM

Meta rolls out Muse, an AI image generator that fuels its ecosystem

Meta rolls out Muse, an AI image generator that fuels its ecosystem

Meta has introduced Muse, a new AI image generator targeting advertising, decoration, and creators. Deeply integrated into Meta's platforms, the move tightens t

More in Frameworks

Design-CP: Context Parallelism Brings Protein Nanoparticle Design to Workstation GPUs

From Graphs to Gradients: Physics-Inspired Explainability for IoT Systems

Prompt-to-Paper: The Agentic AI That Writes and Verifies Scientific Papers

Meituan open-sources LongCat-2.0 as China's domestic AI stack gathers pace

Atrophy: the CLI tool measuring AI atrophy and training skills in vibe coding

Google expands managed agents in Gemini API: more production-ready, but fully cloud

→ View all in Frameworks →

AI-Radar LLM On-Premise

Complete guide to running AI models locally: hardware, stack, privacy, and reference architectures.

👥 Join 160+ AI explorers

A free community of developers, engineers and AI enthusiasts following local AI daily.

Register free → Already a member? Log in