GLM 4.7 Flash Uncensored: Two Variants for Different Uses
Uncensored versions of Z.ai's GLM 4.7 Flash model have been released. This is a 30 billion parameter Mixture of Experts (MoE) model, with approximately 3 billion active parameters and a context window of 200,000 tokens.
The two variants available are:
- Balanced: Optimized for agentic coding tasks that require reliability, while remaining uncensored.
- Aggressive: Ideal for any other uncensored topic.
Several quantizations are available: FP16, Q8_0, Q6_K, Q4_K_M.
Compatibility and Sampling Settings
The model is compatible with llama.cpp, LM Studio, Jan, and koboldcpp. Currently, it has compatibility issues with Ollama due to chat template problems.
The sampling settings suggested by Z.ai are:
- General: --temp 1.0 --top-p 0.95
- Agentic/tool use: --temp 0.7 --top-p 1.0
- Repeat penalty: keep at 1.0 (or disable)
- llama.cpp users: --min-p 0.01 and --jinja
Smaller Models: GPT-OSS 20B
For those looking for smaller models, GPT-OSS 20B, MXFP4 - Lossless, is also available in Balanced and Aggressive versions.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!