A Reddit user shared their experience with the Qwen3.5-35B-A3B-UD-Q6_K_XL model, expressing enthusiasm for its performance in real-world usage scenarios.
Performance and speed
In the tests performed, the model reached a speed of 1504 tokens per 2048 and 47.71 tokens per 256. The token generation speed was high, especially when the model was run on a single GPU, reaching 80 tokens per second.
Testing on real projects
The user tested the model on several projects, using Git Worktrees to simulate specific changes and features. The results were positive, with most issues solvable with minimal tweaks or additional prompts.
Hybrid model and hardware considerations
The experience led the user to consider a hybrid model, using APIs for state-of-the-art models for spec generation and local models for job execution. The user is considering purchasing an RTX 6000 Pro, considering the subscription costs of cloud services and the potential improvement of local models. For those evaluating on-premise deployments, there are trade-offs discussed in detail on /llm-onpremise.
Conclusions
The user expressed great satisfaction with the performance of Qwen3.5, emphasizing its potential for use in production environments and the possibility of reducing reliance on paid cloud services.
๐ฌ Comments (0)
๐ Log in or register to comment on articles.
No comments yet. Be the first to comment!