GLM 5.2 local speeds: 7.8 tokens/sec with six RTX 3090s and 90K context
A Reddit user shared initial local inference metrics for GLM 5.2: running on six RTX 3090s with UD-IQ2_M quantization and a 90K context window, the model generates 7.8 tokens per second. The numbers fuel the debate on what it takes to run large LLMs ...