Qwen LLM by Alibaba

2026-03-05 • LocalLLaMA

Alibaba: Qwen model to remain open-source

Alibaba's CEO has confirmed that the large language model (LLM) Qwen will continue to be developed and distributed under an open-source license. This strategic decision could foster the model's adoption in on-premise scenarios, offering greater flexi...

#LLM On-Premise #DevOps

2026-03-04 • LocalLLaMA

Qwen3.5-35B-A3B: performance close to Claude Opus with continuous verification

A Mixture of Experts model called Qwen3.5-35B-A3B, with only 3 billion active parameters, has achieved surprising performance on the SWE-bench Verified Hard benchmark. By implementing a continuous verification strategy after each code edit, the model...

#LLM On-Premise #DevOps

2026-03-04 • LocalLLaMA

Update on the Qwen shakeup

Updates on the internal reorganization of the Qwen development team, the large language model developed by Alibaba. The news was shared via a post on X (formerly Twitter) and discussed on Reddit.

#LLM On-Premise #DevOps

2026-03-03 • TechCrunch AI

Alibaba’s Qwen tech lead steps down after major AI push

Junyang Lin, tech lead of Alibaba's Qwen team, has stepped down following the launch of a major artificial intelligence model. The news has generated reactions within the team, raising questions about the future strategies of the Chinese giant in the...

#LLM On-Premise #DevOps

2026-03-02 • LocalLLaMA

PSA: Qwen 3.5 Requires BF16 KV Cache, NOT F16

A warning for those running Qwen 3.5 locally with llama.cpp: the KV cache needs to be manually set to BF16 (bfloat16) instead of the default FP16 (float16). Perplexity tests on wikitext-2-raw confirm that official Qwen-team implementations, like vLLM...

#LLM On-Premise #Fine-Tuning #DevOps

2026-03-02 • LocalLLaMA

Alibaba Releases CoPaw for Multi-Channel AI Workflows

Alibaba's team has released CoPaw, a high-performance personal workstation to help developers scale multi-channel artificial intelligence workflows. CoPaw is designed to improve memory management and the efficiency of development processes.

#LLM On-Premise #DevOps

2026-03-02 • LocalLLaMA

Qwen 3.5: new small version available

A new version of the Qwen 3.5 language model has been released. The 'small' version could enable more efficient deployments on hardware with limited resources, opening up new possibilities for on-premise and edge applications.

#LLM On-Premise #DevOps

2026-03-02 • ArXiv cs.CL

Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents

A new discriminative model based on Qwen3-0.6B addresses the segmentation of ultra-long documents, overcoming the limitations of generative models in terms of speed and support for extended inputs. The model uses a sliding-window approach and vector ...

#LLM On-Premise #Fine-Tuning #DevOps

2026-03-01 • LocalLLaMA

Qwen3.5 Small Dense model release seems imminent?

Rumors on Reddit suggest the imminent release of Qwen3.5 Small Dense. The open-source community is eagerly awaiting to evaluate the performance and potential applications of this model.

#Hardware #LLM On-Premise #DevOps

2026-03-01 • LocalLLaMA

Qwen 3.5 27B: Best Chinese Translation Model Under 70B

A LocalLLaMA user reports that Qwen 3.5 27B offers Chinese translations comparable to GPT-3.5 and Gemini, outperforming other models up to 70B. The model was tested on a local setup with 24GB of VRAM, highlighting excellent tone and consistency.

#LLM On-Premise #DevOps

2026-02-28 • LocalLLaMA

Qwen 3.5-35B-A3B: a surprising model for development tasks

A Reddit user reports exceptional results with Qwen 3.5-35B-A3B, a model that has replaced GPT-OSS-120B in their daily workflow. The user employs it for development tasks, process automation, and code analysis, highlighting its ability to compensate ...

#Hardware #LLM On-Premise #DevOps

2026-02-27 • LocalLLaMA

Qwen3.5: promising performance for real-world workloads

A user tested Qwen3.5-35B-A3B-UD-Q6_K_XL on real-world projects, finding positive results. Token generation speed is high, especially on a single GPU. The experience suggests a potential shift to a hybrid model, with API models for spec generation an...

#Hardware #LLM On-Premise #DevOps

2026-02-27 • LocalLLaMA

PewDiePie fine-tuned Qwen2.5-Coder-32B to beat ChatGPT 4o on coding benchmarks

A user fine-tuned the Qwen2.5-Coder-32B model, achieving performance superior to ChatGPT 4o in coding benchmarks. The news, shared on Reddit, highlights the potential of open-source models when optimized for specific tasks. This demonstrates how acce...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-27 • LocalLLaMA

Qwen 3.5 Architecture Analysis: Parameter Distribution in Dense vs MoE Models

An in-depth analysis of the Qwen 3.5 architecture reveals key differences in parameter distribution between the dense (27B) and Mixture of Experts (MoE) (122B and 35B) models. The dense model, despite having a smaller parameter footprint, compensates...

#LLM On-Premise #DevOps

2026-02-27 • LocalLLaMA

Qwen3.5 27B vs Devstral Small 2: Benchmarks on Next.js and Solidity

A user compared the performance of Qwen3.5 27B and Devstral Small 2 in real-world development scenarios, focusing on Next.js and Solidity. The tests, performed on dedicated hardware, evaluated correctness, compatibility, and code discipline, highligh...

#Hardware #LLM On-Premise #DevOps

2026-02-26 • LocalLLaMA

Qwen3.5-27B-heretic: GGUF model available on Hugging Face

A version of the Qwen3.5-27B language model, named "heretic", has been made available in GGUF format on Hugging Face. The GGUF format is designed for efficient CPU inference, making it suitable for running models locally or on hardware with limited r...

#Hardware #LLM On-Premise #DevOps

Qwen LLM by Alibaba

Related Coverage