AI-RADAR.IT · AI-RADAR.NET · AI-RADAR.TECH

News & analysis on local LLMs, stack & on-prem hardware.

📁 LLM AI generated

GGUF Optimizations for Qwen3.5: Unsloth Focuses on Efficiency

Published on 2026-03-05 16:05 ℹ️ LocalLLaMA 📰 Read the original source article →

🏷️ LLM On-Premise 🏷️ Fine-Tuning 🏷️ DevOps

Ottimizzazioni GGUF per Qwen3.5: Unsloth punta all'efficienza

Unsloth has announced a final update for its Qwen3.5 models in GGUF (GPT-Generated Unified Format), aiming to further improve the ratio between model size and Kullback-Leibler divergence (KLD), an indicator of how much the probability distribution of the quantized model deviates from that of the original model.

Key Updates

Calibration Dataset: All GGUFs now use a new imatrix calibration dataset, which should lead to small improvements in performance in chat, coding, long context handling, and tool-calling scenarios.
KLD Divergence Reduction: The quantization method for Qwen3.5 Mixture of Experts (MoE) models has been further refined to directly reduce the maximum KLD divergence. In particular, the UD-Q4_K_XL variant is 8% larger, but reduces the maximum KLD divergence by 51% compared to the version before March 5th.
Model Updates: The Qwen3.5-35B-A3B, 27B, and 122B-A10B models have been updated and made available for re-download. The 397B-A17B model will be updated shortly.
Inference: BF16 (BFloat16) layers have been replaced with F16 (Float16) to speed up inference on unsupported devices.

Quantization and Performance

The following table summarizes the size and KLD divergence variations for different quantization configurations:

Quant	Old GB	New GB	Max KLD Old	Max KLD New
UD-Q2_K_XL	12.0	11.3 (-6%)	8.237	8.155 (-1%)
UD-Q3_K_XL	16.1	15.5 (-4%)	5.505	5.146 (-6.5%)
UD-Q4_K_XL	19.2	20.7 (+8%)	5.894	2.877 (-51%)
UD-Q5_K_XL	23.2	24.6 (+6%)	5.536	3.210 (-42%)

These updates aim to make the Qwen3.5 models more efficient and performant, especially in local usage contexts. For those evaluating on-premise deployments, there are trade-offs to consider carefully; AI-RADAR offers analytical frameworks on /llm-onpremise to support these evaluations.

AI-Radar Takeaway

Unsloth releases a final update for Qwen3.5 models in GGUF format, focusing on improving the size/KLD divergence tradeoff. Optimizations include a new calibration dataset and a reduction in maximum KLD divergence, resulting in improvements in chat, coding, and tool-calling. Updates are available for several models, including Qwen3.5-35B, 27B, and 122B.

🤖 Ask AI about this

Want to dive deeper? Read the full article from the source:

📖 READ THE ORIGINAL ARTICLE

💻 Need GPU Cloud Infrastructure?

For running LLM inference, training models, or testing hardware configurations, check out this platform:

Railway Cloud Infrastructure

Modern cloud platform with instant deployments. Deploy from GitHub in seconds with automatic HTTPS, databases, and monitoring. Perfect for web apps, APIs, and LLM inference services.

✓ GitHub integration ✓ Auto HTTPS ✓ Simple pricing

🔗 This is an affiliate link - we may earn a commission at no extra cost to you.

AI-RADAR NEWSLETTER

Stay ahead — get AI signals in your inbox

Daily or weekly digest of the most important AI news. 160+ readers, no spam.

💬 Comments (0)

🔒 Log in or register to comment on articles.

No comments yet. Be the first to comment!

🔍 Continue Exploring

Explore LLM On-Premise

Complete guide to running AI models locally: hardware, stack, and privacy.

Qwen3.5-35B-A3B: Optimized GGUF for 24GB GPUs

Qwen3.5-35B-A3B: Optimized GGUF for 24GB GPUs

A new GGUF quantization for the Qwen3.5-35B-A3B model promises improved performance on GPUs with 24GB of VRAM. The optimization focuses on using q8_0/q4_0/q4_1

OpenAI Unveils GPT-5.5: A New Base Model for Complex Tasks

OpenAI Unveils GPT-5.5: A New Base Model for Complex Tasks

OpenAI has announced GPT-5.5, its first fully retrained base model since GPT-4.5. Codenamed "Spud," it is designed to handle complex multi-step tasks with minim

New training method boosts AI multimodal reasoning with smaller, smarter datasets

New training method boosts AI multimodal reasoning with smaller, smarter datasets

Researchers released a new training framework that improves the capabilities of language models in multimodal reasoning using smaller, smarter datasets.

GPT-5.6 Sol: OpenAI's new model raises the bar for on-premise evaluators

GPT-5.6 Sol: OpenAI's new model raises the bar for on-premise evaluators

OpenAI's latest preview shows advances in coding, science, and security, but remains a cloud service: for those pursuing data sovereignty, the gap with self-hos

GPT-5.2: OpenAI's strongest model yet for math and science

GPT-5.2 is OpenAI's strongest model yet for math and science, setting new state-of-the-art results on benchmarks like GPQA Diamond and FrontierMath. This post s

More in LLM

Even Google believes in small coding models

SpectralQuant narrows the Q4_K_M quantization gap to 96.5%: a leap for local models

Two new AI tools from Tokyo and Beijing fill the gap left by Anthropic's export ban

ConlangCrafter: The AI That Invents Imaginary Languages (and Could Teach Us How We Think)

Orthrus brings diffusion head to Qwen 3.5/3.6 and Gemma 4: open-source code dropping soon

Qwen Fine-tunes: Why Optimized Models Struggle to Impress

→ View all in LLM →

AI-Radar LLM On-Premise

Complete guide to running AI models locally: hardware, stack, privacy, and reference architectures.

👥 Join 160+ AI explorers

A free community of developers, engineers and AI enthusiasts following local AI daily.

Register free → Already a member? Log in