📁 LLM

The LLM archive monitors model releases, quantization updates, reasoning capabilities, and real-world deployment implications for local and hybrid AI. We focus on what materially changes selection and operations: context windows, latency, memory footprint, licensing, and evaluation evidence across open and commercial families. This section is designed for teams that need dependable model intelligence, not hype cycles. Pair these updates with the LLM pillar and references to hardware constraints and framework integration.

An artificial intelligence has formally verified the mathematical proofs of Fields Medal winner Maryna Viazovska, accelerating mathematical research. The AI validated the solution to the sphere packing problem in 8 and 24 dimensions, demonstrating the potential of AI to assist mathematicians and opening new frontiers in large-scale formalization.

2026-03-02 Fonte

Anthropic's AI chatbot Claude experienced widespread service disruptions on Monday morning, with thousands of users reporting issues accessing the bot. The incident raised questions about the stability of cloud infrastructures supporting large language models.

2026-03-02 Fonte

The Jan team has released Jan-Code-4B, a small code-tuned model for coding tasks. Based on Jan-v3-4B-base-instruct, it aims to provide assistance in code development, generation, refactoring, and debugging, while maintaining a lightweight footprint for local execution. It can replace the Haiku model in Claude Code.

2026-03-02 Fonte

A warning for those running Qwen 3.5 locally with llama.cpp: the KV cache needs to be manually set to BF16 (bfloat16) instead of the default FP16 (float16). Perplexity tests on wikitext-2-raw confirm that official Qwen-team implementations, like vLLM, use BF16, while llama.cpp defaults to F16.

2026-03-02 Fonte

A new version of the Qwen 3.5 language model has been released. The 'small' version could enable more efficient deployments on hardware with limited resources, opening up new possibilities for on-premise and edge applications.

2026-03-02 Fonte

A new approach, called REPO (Representation Erasure-based Preference Optimization), aims to reduce the generation of toxic outputs by large language models (LLMs). REPO intervenes at the level of internal model representation, forcing the convergence of toxic representations towards benign ones, demonstrating greater robustness than traditional methods.

2026-03-02 Fonte

A new system based on LLMs and RAG automates adverse media screening, a critical component of AML and KYC processes. The LLM agent searches, processes documents, and calculates a risk index, demonstrating the ability to distinguish between high-risk and low-risk individuals.

2026-03-02 Fonte

AI can speed up progress, but is reaching the destination without the journey worth it? Reflections on the importance of human experience in the age of automation.

2026-03-01 Fonte

Rumors on Reddit suggest the imminent release of Qwen3.5 Small Dense. The open-source community is eagerly awaiting to evaluate the performance and potential applications of this model.

2026-03-01 Fonte

A Reddit post sparks interest in the LocalLLaMA community, with speculation about the arrival of new features. The discussion highlights the growing interest in locally run LLM solutions.

2026-03-01 Fonte

A LocalLLaMA user reports that Qwen 3.5 27B offers Chinese translations comparable to GPT-3.5 and Gemini, outperforming other models up to 70B. The model was tested on a local setup with 24GB of VRAM, highlighting excellent tone and consistency.

2026-03-01 Fonte

New research from Google challenges the assumption that longer reasoning chains lead to better results in language models. The study introduces the concept of Deep Thinking Ratio (DTR) to measure reasoning quality, demonstrating that accurate token selection can reduce computational load while maintaining or improving accuracy.

2026-02-28 Fonte

According to the Financial Times, DeepSeek is preparing to release version 4 of its artificial intelligence model. The new version will include advanced image and video generation capabilities, positioning itself as a direct competitor to models developed in the United States.

2026-02-28 Fonte

A Reddit user reports exceptional results with Qwen 3.5-35B-A3B, a model that has replaced GPT-OSS-120B in their daily workflow. The user employs it for development tasks, process automation, and code analysis, highlighting its ability to compensate for a lack of knowledge with browser access.

2026-02-28 Fonte

A Reddit user praises the LocalLLaMA community for its DIY approach to artificial intelligence, contrasting it with the industry's trend towards proprietary solutions and vendor lock-in. The use of consumer GPUs like the RTX 3090 to develop models locally is seen as a viable alternative and an example of bottom-up innovation.

2026-02-28 Fonte

A monthly overview of top-performing open-weight models, evaluated based on community discussions and benchmarks. The initiative aims to provide an updated view of open-source alternatives to proprietary models, focusing on their capabilities and limitations.

2026-02-28 Fonte

A Reddit post reminisces about the early days of LocalLLaMA, when running language models locally was a pioneering challenge. The discussion highlights how the open-source community pushed the boundaries of on-premise inference, paving the way for today's solutions. For those evaluating on-premise deployments, there are trade-offs to consider carefully.

2026-02-28 Fonte