NanoChat: An Economical LLM

Andrej Karpathy has presented NanoChat, a language model that reportedly surpasses the performance of GPT-2 for under $100. The training was conducted on 8 H100 GPUs in just three hours.

Technical Details

Karpathy shared details regarding the model architecture, the optimizers used, and the data setup. A script is also available to reproduce the results obtained. This allows other technicians to replicate the experiment and potentially further develop the model.

For those evaluating on-premise deployments, there are trade-offs to consider carefully. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.