## Fix for GLM 4.7 Flash A patch addressing an issue related to GLM 4.7 Flash has been successfully merged into the llama.cpp project. This change is expected to enhance the stability and reliability of models utilizing this specific implementation. ## Future Developments: CUDA FA Support In parallel, engineers are working on implementing Fused Attention (FA) support on CUDA. This enhancement aims to fully leverage the capabilities of NVIDIA GPUs, further accelerating inference processes and reducing computation times. The progress of this development can be tracked via the dedicated pull request on GitHub.