Qwen3.5-27B: An Enhanced Local Model
A new version of the Qwen3.5-27B model has been released, the result of optimization work that includes the removal of censorship and improved context management. This version is based on the model fine-tuned by Jackrong on a specific dataset and further modified to reduce Kullback-Leibler divergence, a parameter that indicates the similarity between probability distributions.
Optimizations and Performance
The implemented changes aim to restore the functionality of the attn_v and ffn_gate_exps layers, which are crucial for managing context during conversations. The resulting model, quantized in Q4_K_M format, promises to maintain a context of 262K. However, performance on older hardware, such as an RTX 3060 12 GB, may be limited (approximately 4 tok/sec) due to the model's density and the absence of an MoE (Mixture of Experts) architecture.
For those evaluating on-premise deployments, there are trade-offs between model size, accuracy, and hardware requirements. AI-RADAR offers analytical frameworks on /llm-onpremise to evaluate these aspects.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!