📁 LLM

The LLM archive monitors model releases, quantization updates, reasoning capabilities, and real-world deployment implications for local and hybrid AI. We focus on what materially changes selection and operations: context windows, latency, memory footprint, licensing, and evaluation evidence across open and commercial families. This section is designed for teams that need dependable model intelligence, not hype cycles. Pair these updates with the LLM pillar and references to hardware constraints and framework integration.

Google's AI Overviews are presenting entries from the SCP Foundation, a collaborative horror fan-fiction project, as documented reality. As reported by Futurism, the glitch highlights hallucination risks in public-facing LLMs. For organizations handling critical data, the incident underscores the importance of source control and on-premise deployment to constrain information domain and maintain trust.

2026-06-19 Fonte

A New York Times investigation highlights the spread of 'humaniser' tools that rewrite AI-generated text to evade detection. The arms race is already lost: the real challenge is rethinking assessment, not catching the algorithm.

2026-06-19 Fonte

Ohio State University’s NLP team has released QUEST-35B, a fully open-source Deep Research agent, including code, weights, training recipe, and a synthetic dataset of 8,000 examples. Benchmarks show competitive performance against leading closed-source systems, strengthening the case for self-hosted, privacy-first AI research tools.

2026-06-19 Fonte

The 2-bit quantized GLM-5.2 shrinks from 1.51TB to 238GB while retaining ~82% accuracy. It can now run locally on a 256GB Mac or systems with enough RAM/VRAM via llama.cpp and Unsloth Studio, opening new possibilities for on-premise AI deployment.

2026-06-19 Fonte

SupraLabs releases SupraVL-Nano-900k, a 900k-parameter vision-language model trained from scratch on Flickr8k. Not a production model, but a transparent blueprint for anyone wanting to understand how VLMs work: every component, from the CNN visual encoder to the GPT-2 style decoder, is written from scratch and documented in a Jupyter notebook. Licensed Apache 2.0, it sheds light on model internals—valuable for those planning on-premise deployment.

2026-06-19 Fonte

A research team has developed an ensemble system of Large Language Models to automatically detect studies reporting EQ-5D data in PubMed abstracts. By combining Google’s Gemini and Gemma models with a weighted stacking strategy, the approach achieved an F1-score of 0.74, exceeding individual model performance. This offers a promising path for those managing systematic reviews in biomedicine, though deploying multiple models locally raises questions about resources and latency.

2026-06-19 Fonte

A visual analytics tool aggregates hundreds of stochastic responses to uncover hidden LLM biases, beyond single-prompt audits. Tested on GPT-2 XL and aligned models, it reduces analysts' cognitive load and enables systematic checks for on-premise deployments where data sovereignty and audit trails matter.

2026-06-19 Fonte

A team proposes a semantic retrieval pipeline to measure a CS program's coverage of CS2013 and CS2023. Among seven retrievers, a rank fusion ensemble performed best; a reputed long-context model was beaten by a small sentence model. Relevant for those building on-premise retrieval systems: huge LLMs aren't always needed.

2026-06-19 Fonte

North Mini Code team drops a 4-bit quantized version on Hugging Face, requiring around 20 GB of memory. The model now runs on local hardware via Ollama and llama.cpp-based runtimes, and is also available through the OpenRouter API – a move that boosts portability for on-premise inference and self-hosted development.

2026-06-18 Fonte

OpenAI introduces GPT-5.5 Instant, optimized for ChatGPT's health and wellness responses with stronger reasoning, better context, and physician-informed evaluations. For healthcare organizations considering on-prem deployment for data sovereignty, this progress raises questions about hardware requirements, quantization trade-offs, and regulatory compliance—key factors in TCO and control assessments.

2026-06-18 Fonte

Poolside has released Laguna M.1, a Mixture-of-Experts LLM with 225 billion total parameters (23B activated per token), optimized for agentic coding and extended contexts (262,144 tokens). The model, under Apache 2.0 license, features a 70-layer architecture and 256 experts, offering native reasoning support. Its scale makes it particularly relevant for on-premise deployment evaluations, requiring specific hardware and careful TCO analysis.

2026-06-18 Fonte

GLM-5.2 has been recognized as the top "open weight" Large Language Model (LLM) for creative writing, according to Sam Paech's benchmark on EQ Bench. This achievement highlights the potential of accessible models for on-premise deployment scenarios, offering enterprises greater control and flexibility compared to proprietary cloud-based solutions, with significant implications for data sovereignty and Total Cost of Ownership (TCO).

2026-06-18 Fonte

Research has uncovered a surprising narrative uniformity across popular Large Language Models. Characters like Elias Thorne, the lighthouse keeper, appear in over 88% of generated stories, regardless of the model. This phenomenon raises questions about the diversity of training datasets and the implications for original content generation.

2026-06-18 Fonte