Qwen3.5-27B: Optimized and Uncensored Model for Local Inference

Qwen3.5-27B: An Enhanced Local Model

A new version of the Qwen3.5-27B model has been released, the result of optimization work that includes the removal of censorship and improved context management. This version is based on the model fine-tuned by Jackrong on a specific dataset and further modified to reduce Kullback-Leibler divergence, a parameter that indicates the similarity between probability distributions.

Optimizations and Performance

The implemented changes aim to restore the functionality of the attn_v and ffn_gate_exps layers, which are crucial for managing context during conversations. The resulting model, quantized in Q4_K_M format, promises to maintain a context of 262K. However, performance on older hardware, such as an RTX 3060 12 GB, may be limited (approximately 4 tok/sec) due to the model's density and the absence of an MoE (Mixture of Experts) architecture.

For those evaluating on-premise deployments, there are trade-offs between model size, accuracy, and hardware requirements. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.

Qwen3.5-27B: Optimized and Uncensored Model for Local Inference

Qwen3.5-27B: An Enhanced Local Model

Optimizations and Performance

💻 Need GPU Cloud Infrastructure?

💬 Comments (0)

🔍 Continue Exploring

Explore LLM On-Premise

Local LLMs: One Month of Intense Learning

Qwen3-Coder-Next: New language model for programming

Qwen 3.5 Architecture Analysis: Parameter Distribution in Dense vs MoE Models

👥 Join 160+ AI explorers