📁 Frameworks AI generated

Kimi-Linear-48B-A3B & Step3.5-Flash are ready - llama.cpp

Published on 2026-02-07 08:11 ℹ️ LocalLLaMA 📰 Read the original source article →

Kimi-Linear-48B-A3B e Step3.5-Flash disponibili per llama.cpp

Releases of Kimi-Linear-48B-A3B and Step3.5-Flash compatible with llama.cpp are now available.

Details

Step3.5-Flash: available at release b7964.
Kimi-Linear-48B-A3B: available at release b7957.

Currently, official GGUF files for these models are not yet available on Hugging Face. However, the community is working to make them available.

Ubergarm has already released a GGUF version for Step-3.5-Flash, available on Hugging Face.

The availability of these models in formats compatible with llama.cpp facilitates inference on local hardware, opening new possibilities for those who want to run large language models (LLMs) on-premise.

AI-Radar Takeaway

Releases of Kimi-Linear-48B-A3B and Step3.5-Flash compatible with llama.cpp are now available. Official GGUF files are not yet available, but the community is already working on their creation. The availability of these models expands options for local inference.

🤖 Ask AI about this

Want to dive deeper? Read the full article from the source:

📖 READ THE ORIGINAL ARTICLE

💻 Need GPU Cloud Infrastructure?

For running LLM inference, training models, or testing hardware configurations, check out this platform:

⚡

RunPod GPU Cloud Platform

Flexible GPU cloud with pay-per-second billing. Deploy instantly with Docker support, auto-scaling, and a wide selection of GPU types from RTX 4090 to H100.

✓ No commitments ✓ Instant deployment ✓ Production-ready

🔗 This is an affiliate link - we may earn a commission at no extra cost to you.

AI-RADAR NEWSLETTER

Stay ahead — get AI signals in your inbox

Daily or weekly digest of the most important AI news. 160+ readers, no spam.

💬 Comments (0)

🔒 Log in or register to comment on articles.

No comments yet. Be the first to comment!

🔍 Continue Exploring

Guide

LLM Quantization Explained

How quantization cuts VRAM and cost while preserving model quality.

Read →

Guide

The Local LLM Software Stack

Runtimes, inference servers, and tooling for an on-premise deployment.

Read →

LLM Jan 22

Hugging Face: the week's top trending models

Hugging Face has released several models that are gaining considerable traction. Highlights include GLM-4.7-Flash for fast text generation, GLM-Image for image

Read →

LLM Feb 10

Kimi-Linear-48B-A3B-Instruct: LLM model and GGUF for extended context

A new LLM model, Kimi-Linear-48B-A3B-Instruct, is available with promising support for extended contexts, surpassing GLM 4.7 Flash. The community has released a

Read →

LLM Feb 10

Step-3.5-Flash: A Compact Yet Powerful LLM

A user reported the effectiveness of the Step-3.5-Flash model, highlighting its superior performance compared to larger models like GPT OSS 120B in certain cont

Read →

LLM Jan 20

GLM-4.7-Flash implementation in llama.cpp: issues confirmed

Recent discussions suggest that the GLM-4.7-Flash implementation in llama.cpp has issues. Significant differences in logprobs compared to vLLM could explain ano

Read →

LLM Feb 02

Step-3.5-Flash: outperforms with fewer parameters

The Step-3.5-Flash model, with a reduced active parameter architecture (11B out of 196B total), demonstrates superior performance compared to DeepSeek v3.2 in c

Read →

Kimi-Linear-48B-A3B & Step3.5-Flash are ready - llama.cpp

Details

💻 Need GPU Cloud Infrastructure?

Stay ahead — get AI signals in your inbox

💬 Comments (0)

🔍 Continue Exploring

More in Frameworks

👥 Join 160+ AI explorers