NanoLLama is a framework that allows training models based on the Llama 3 architecture from scratch. Unlike fine-tuning or using LoRA techniques, NanoLLama performs complete pre-training, generating a GGUF file compatible with llama.cpp.

Key Features

  • Simplified Training: The entire training process, from data download to GGUF export, is executed with a single command.
  • Llama 3 Architecture: Supports the full Llama 3 architecture, with configurations ranging from 46 million to 7 billion parameters.
  • Multi-corpus Training: Uses a multi-corpus training approach, based on the SmolLM2 recipe, including FineWeb-Edu, DCLM, code, and mathematics.
  • Native GGUF Export: Exports directly to GGUF v3 format, without the need for conversions via HuggingFace or safetensors.
  • Personality Injection: Allows training a base model and a model with personality, then subtracting the weights to obtain a portable personality vector.
  • Go Inference Engine: Includes an inference engine developed in Go (approximately 9MB), which directly reads GGUF files, useful when the entire llama.cpp stack is not needed.

Pre-trained Models

Several models have already been trained and verified, including nano (46M), micro (87M), mini (175M), and small (338M). Training is underway for goldie (1.1B), a multilingual model.