GLM 4.7 Flash Uncensored: Two Variants for Different Uses

Uncensored versions of Z.ai's GLM 4.7 Flash model have been released. This is a 30 billion parameter Mixture of Experts (MoE) model, with approximately 3 billion active parameters and a context window of 200,000 tokens.

The two variants available are:

  • Balanced: Optimized for agentic coding tasks that require reliability, while remaining uncensored.
  • Aggressive: Ideal for any other uncensored topic.

Several quantizations are available: FP16, Q8_0, Q6_K, Q4_K_M.

Compatibility and Sampling Settings

The model is compatible with llama.cpp, LM Studio, Jan, and koboldcpp. Currently, it has compatibility issues with Ollama due to chat template problems.

The sampling settings suggested by Z.ai are:

  • General: --temp 1.0 --top-p 0.95
  • Agentic/tool use: --temp 0.7 --top-p 1.0
  • Repeat penalty: keep at 1.0 (or disable)
  • llama.cpp users: --min-p 0.01 and --jinja

Smaller Models: GPT-OSS 20B

For those looking for smaller models, GPT-OSS 20B, MXFP4 - Lossless, is also available in Balanced and Aggressive versions.