Topic / Trend Rising

Qwen LLM Developments

The Qwen series of large language models is gaining traction, with new models and optimizations being released. These models are being used for a variety of tasks, including coding, translation, and reasoning, and are attracting attention from the open-source community.

Detected: 2026-03-03 · Updated: 2026-03-03

Related Coverage

2026-03-02 • LocalLLaMA

PSA: Qwen 3.5 Requires BF16 KV Cache, NOT F16

A warning for those running Qwen 3.5 locally with llama.cpp: the KV cache needs to be manually set to BF16 (bfloat16) instead of the default FP16 (float16). Perplexity tests on wikitext-2-raw confirm that official Qwen-team implementations, like vLLM...

#LLM On-Premise #Fine-Tuning #DevOps

2026-03-02 • LocalLLaMA

Qwen 3.5: new small version available

A new version of the Qwen 3.5 language model has been released. The 'small' version could enable more efficient deployments on hardware with limited resources, opening up new possibilities for on-premise and edge applications.

#LLM On-Premise #DevOps

2026-03-02 • ArXiv cs.CL

Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents

A new discriminative model based on Qwen3-0.6B addresses the segmentation of ultra-long documents, overcoming the limitations of generative models in terms of speed and support for extended inputs. The model uses a sliding-window approach and vector ...

#LLM On-Premise #Fine-Tuning #DevOps

2026-03-01 • LocalLLaMA

Qwen3.5 Small Dense model release seems imminent?

Rumors on Reddit suggest the imminent release of Qwen3.5 Small Dense. The open-source community is eagerly awaiting to evaluate the performance and potential applications of this model.

#Hardware #LLM On-Premise #DevOps

2026-03-01 • LocalLLaMA

Qwen 3.5 27B: Best Chinese Translation Model Under 70B

A LocalLLaMA user reports that Qwen 3.5 27B offers Chinese translations comparable to GPT-3.5 and Gemini, outperforming other models up to 70B. The model was tested on a local setup with 24GB of VRAM, highlighting excellent tone and consistency.

#LLM On-Premise #DevOps

2026-02-28 • LocalLLaMA

Google: Longer Reasoning Chains Don't Imply Higher Accuracy in LLMs

New research from Google challenges the assumption that longer reasoning chains lead to better results in language models. The study introduces the concept of Deep Thinking Ratio (DTR) to measure reasoning quality, demonstrating that accurate token s...

#LLM On-Premise #DevOps

2026-02-28 • LocalLLaMA

DeepSeek V4: Image and Video Generation Capabilities Coming Next Week

According to the Financial Times, DeepSeek is preparing to release version 4 of its artificial intelligence model. The new version will include advanced image and video generation capabilities, positioning itself as a direct competitor to models deve...

#LLM On-Premise #DevOps

2026-02-28 • LocalLLaMA

Qwen 3.5-35B-A3B: a surprising model for development tasks

A Reddit user reports exceptional results with Qwen 3.5-35B-A3B, a model that has replaced GPT-OSS-120B in their daily workflow. The user employs it for development tasks, process automation, and code analysis, highlighting its ability to compensate ...

#Hardware #LLM On-Premise #DevOps

2026-02-27 • LocalLLaMA

Qwen3.5: promising performance for real-world workloads

A user tested Qwen3.5-35B-A3B-UD-Q6_K_XL on real-world projects, finding positive results. Token generation speed is high, especially on a single GPU. The experience suggests a potential shift to a hybrid model, with API models for spec generation an...

#Hardware #LLM On-Premise #DevOps

2026-02-27 • LocalLLaMA

PewDiePie fine-tuned Qwen2.5-Coder-32B to beat ChatGPT 4o on coding benchmarks

A user fine-tuned the Qwen2.5-Coder-32B model, achieving performance superior to ChatGPT 4o in coding benchmarks. The news, shared on Reddit, highlights the potential of open-source models when optimized for specific tasks. This demonstrates how acce...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-27 • LocalLLaMA

Qwen 3.5 Architecture Analysis: Parameter Distribution in Dense vs MoE Models

An in-depth analysis of the Qwen 3.5 architecture reveals key differences in parameter distribution between the dense (27B) and Mixture of Experts (MoE) (122B and 35B) models. The dense model, despite having a smaller parameter footprint, compensates...

#LLM On-Premise #DevOps

2026-02-27 • LocalLLaMA

Qwen3.5 27B vs Devstral Small 2: Benchmarks on Next.js and Solidity

A user compared the performance of Qwen3.5 27B and Devstral Small 2 in real-world development scenarios, focusing on Next.js and Solidity. The tests, performed on dedicated hardware, evaluated correctness, compatibility, and code discipline, highligh...

#Hardware #LLM On-Premise #DevOps

2026-02-26 • LocalLLaMA

Qwen3.5-27B-heretic: GGUF model available on Hugging Face

A version of the Qwen3.5-27B language model, named "heretic", has been made available in GGUF format on Hugging Face. The GGUF format is designed for efficient CPU inference, making it suitable for running models locally or on hardware with limited r...

#Hardware #LLM On-Premise #DevOps

2026-02-26 • LocalLLaMA

Qwen3.5-35B-A3B: promising developments for language models

The open-source community reports significant progress with the Qwen3.5-35B-A3B language model. In particular, there is discussion of a framework for semantic testing of SQL queries. Expectations remain high for a smaller version, Qwen3.5-4B.

#LLM On-Premise #DevOps

2026-02-26 • LocalLLaMA

Qwen 3.5 35B MoE: 40+ tokens/s on RTX 5060 Ti with 100k context

Performance tests of the Qwen 3.5 35B MoE language model on an RTX 5060 Ti 16GB. Results show generation speeds exceeding 40 tokens per second with a 100,000 token context, opening possibilities for LLM inference on consumer hardware. Tests were perf...

#Hardware #LLM On-Premise #DevOps

2026-02-24 • LocalLLaMA

Qwen/Qwen3.5-122B-A10B: Open Source Language Model on Hugging Face

The Qwen3.5-122B-A10B language model is now available on Hugging Face. This open-source release offers new opportunities for research and development of artificial intelligence applications, enabling greater control and customization compared to prop...

#Hardware #LLM On-Premise #DevOps

2026-02-24 • LocalLLaMA

New Qwen3.5 models spotted on Qwen Chat

New Qwen3.5 models have been spotted on the Qwen Chat platform. The discovery was reported on Reddit, sparking discussions within the LocalLLaMA community regarding the implications and potential applications of these updated models.

← Back to All Topics