## Issues in the GLM-4.7-Flash implementation in llama.cpp The presence of issues in the current GLM-4.7-Flash implementation within llama.cpp has been confirmed. Discussions regarding this have taken place publicly. ## Significant differences compared to vLLM Notable discrepancies in logprobs compared to vLLM have been found. These differences may be the cause of issues such as infinite loops, overthinking, and, in general, a poor user experience, as reported by several users. ## Implications for users These implementation issues can lead to unexpected results and an unsatisfactory user experience. It is advisable to monitor the updates and fixes that will be released to resolve these problems. The use of large language models (LLMs) requires careful evaluation of implementations to ensure optimal performance and reliable results.