GLM 4.7 Flash: Uncensored "Balanced" & "Aggressive" Variants

GLM 4.7 Flash Uncensored: Two Variants for Different Uses

Uncensored versions of Z.ai's GLM 4.7 Flash model have been released. This is a 30 billion parameter Mixture of Experts (MoE) model, with approximately 3 billion active parameters and a context window of 200,000 tokens.

The two variants available are:

Balanced: Optimized for agentic coding tasks that require reliability, while remaining uncensored.
Aggressive: Ideal for any other uncensored topic.

Several quantizations are available: FP16, Q8_0, Q6_K, Q4_K_M.

Compatibility and Sampling Settings

The model is compatible with llama.cpp, LM Studio, Jan, and koboldcpp. Currently, it has compatibility issues with Ollama due to chat template problems.

The sampling settings suggested by Z.ai are:

General: --temp 1.0 --top-p 0.95
Agentic/tool use: --temp 0.7 --top-p 1.0
Repeat penalty: keep at 1.0 (or disable)
llama.cpp users: --min-p 0.01 and --jinja

Smaller Models: GPT-OSS 20B

For those looking for smaller models, GPT-OSS 20B, MXFP4 - Lossless, is also available in Balanced and Aggressive versions.

GLM 4.7 Flash: Uncensored "Balanced" & "Aggressive" Variants

GLM 4.7 Flash Uncensored: Two Variants for Different Uses

Compatibility and Sampling Settings

Smaller Models: GPT-OSS 20B

💻 Need GPU Cloud Infrastructure?

💬 Commenti (0)

📚 Approfondimenti

Approfondisci su LLM On-Premise

GLM-4.7-Flash: il modello di Z.ai per inferenza locale

GLM-4.7-Flash: un modello da 30B impressionante nel BrowseComp

GLM-4.7-Flash: un modello LLM con un processo di pensiero chiaro