GLM-4.7-Flash: impressive benchmarks on H200 and RTX 6000 Ada
The GLM-4.7-Flash model demonstrates remarkable performance in new benchmarks. On a single H200 GPU, it achieves a peak throughput of 4,398 tokens per second. Using an RTX 6000 Ada, the model generates 112 tokens per second utilizing Unsloth dynamic quantization and llama.cpp. The tests reveal the model's efficiency in various usage scenarios.