64 GB VRAM and Coding LLMs: An On-Premise Experiment with Qwen 3.5 122b
A Reddit user with 64 GB VRAM shares their local inference setup: an Unsloth version of Qwen 3.5 122b-a10b (UD-IQ4_NL quantization), 100k token context, and around 30 tok/sec. The MoE architecture with 10B active parameters fits within the VRAM budge...