A Reddit user shared their experience with the Qwen3.5-35B-A3B-UD-Q6_K_XL model, expressing enthusiasm for its performance in real-world usage scenarios.

Performance and speed

In the tests performed, the model reached a speed of 1504 tokens per 2048 and 47.71 tokens per 256. The token generation speed was high, especially when the model was run on a single GPU, reaching 80 tokens per second.

Testing on real projects

The user tested the model on several projects, using Git Worktrees to simulate specific changes and features. The results were positive, with most issues solvable with minimal tweaks or additional prompts.

Hybrid model and hardware considerations

The experience led the user to consider a hybrid model, using APIs for state-of-the-art models for spec generation and local models for job execution. The user is considering purchasing an RTX 6000 Pro, considering the subscription costs of cloud services and the potential improvement of local models. For those evaluating on-premise deployments, there are trade-offs discussed in detail on /llm-onpremise.

Conclusions

The user expressed great satisfaction with the performance of Qwen3.5, emphasizing its potential for use in production environments and the possibility of reducing reliance on paid cloud services.