Hardware for Local Intelligence
Benchmarks, GPU sizing guides, and workstation builds for sovereignty.
Linux MD RAID5 gains up to 17% scalability boost: implications for on-prem storage
A fresh patch series for Linux MD RAID5 brings scalability improvements of 10–17% in certain configurations. The development is directly relevant...
Orthrus brings diffusion head to Qwen 3.5/3.6 and Gemma 4: open-source code dropping soon
Orthrus models with a diffusion head are about to land on Hugging Face, joined by full end-to-end training and evaluation code. A pairing that...
GNOME’s AI Assistant Now Generates Images: Newelle 1.4.5 Arrives
After three years of development, Newelle reaches version 1.4.5 with two major updates: AI image generation support and a redesigned chat...
The next AI won’t be powered by better models alone
Oxylabs CEO suggests the real leap lies beyond models — in data quality and freshness. For those running LLMs on-prem, data sovereignty and robust...
A 96GB VRAM RTX 5090 from Shenzhen's Huaqiangbei Market for $8,200
A hands-on report from Shenzhen's Huaqiangbei market confirms offers of modified GeForce RTX 5090 cards with 96GB of VRAM. With a base card cost...
US clears Anthropic to restore Mythos 5 to a small group of cyber defenders
The Commerce Department greenlights Anthropic to restore access to Mythos 5, its most powerful cybersecurity model, for a select group of trusted...
Llama.cpp cuts CUDA synchronizations, boosting on-premise inference performance
A recent llama.cpp commit reintroduces more aggressive asynchronous handling for CUDA backends, cutting synchronizations between tokens and...
AI chip demand squeezes global freight, putting on-premise plans at risk
Surging demand for AI accelerators is congesting air and sea freight, driving up shipping rates. For enterprises building on-premise LLM...
SYM's Profits Fall in 2025 Despite Record Market Share
The Taiwanese motorcycle manufacturer saw profits decline in 2025, even as it captured its highest-ever market share. A paradox that mirrors...
JCET's US$1.1bn expansion shows where China's AI chip crunch is moving
JCET’s $1.1 billion expansion in advanced packaging shows China's strategy to bypass semiconductor restrictions and secure AI accelerator supply....
Qwen Fine-tunes: Why Optimized Models Struggle to Impress
Despite the popularity of fine-tuning Qwen models, concrete evidence of versions truly outperforming the base is scarce. This raises questions...
DeepSeek V4 Flash and MiniMax M3 on llama.cpp: When will native support arrive?
The community is waiting for official integration of DeepSeek V4 Flash and MiniMax M3 models into llama.cpp. Forks provide partial solutions, but...
DeepSeek-V4-Pro-DSpark: A New Open-Source LLM Targeting Local Deployment
DeepSeek releases the V4-Pro-DSpark model on Hugging Face along with the DSpark technical paper. This release fuels the strategy of those betting...
Ornith-1.0-35B Q3_K_M: 17 GB VRAM, all benchmarks pass, extreme quantization holds up
Ornith-1.0-35B has been quantized to Q3_K_M, achieving 16.8 GB on disk and ~17 GiB loaded VRAM. Validated with KL divergence probes and 14/14...
Distilling Your Own LLM for Theorem Proving: When On-Premise Beats the Cloud
A user with hardware funding but no cloud credits considers distilling an LLM for theorem proving in Rocq, a niche lacking tailored models. The...
Liquid Cooling Comes to 800V DC Busbars for AI Data Centers
At Wiwynn's booth, TE Connectivity's new 800V DC busbars with integrated liquid cooling were on display. It's a clear sign that power delivery for...
Anthropic’s Mythos 5 Authorized for Over 100 US Entities: A Turn for Sovereign AI?
The Trump administration has authorized over 100 companies and government agencies to use Anthropic's Mythos 5, including their non-American...
Trump Administration Allows Anthropic to Release Mythos to Select US Organizations
After weeks of negotiations, the White House authorized Anthropic to restore access to its most advanced model, Mythos, for a select group of US...
South Korea to train entire military as drone warriors: Edge AI puts inference on the front line
Seoul aims to make drones a universal combat tool for its half-million troops, inspired by lessons from Ukraine. The move shifts the weight of...
llama.cpp: Vulkan Tensor Parallelism Now Within Reach
Pull request #25051 by Piotr ‘pwilkin’ makes Vulkan tensor parallelism usable in llama.cpp, opening LLM inference to non-NVIDIA GPUs. A concrete...
Nemotron-3-Super Nails 504K-Token Needle Retrieval on 4× RTX 3090
NVIDIA's hybrid Mamba+MoE model, quantized to 71 GB, runs entirely on consumer GPUs and achieves perfect needle retrieval up to 504,482 tokens....
A software veteran builds a local LLM harness and asks the community: what do you need?
A developer with 45 years of enterprise tooling experience is about to release an open-source harness designed to simplify local LLM deployment....
Ford had to rehire 350 engineers after AI got vehicle quality wrong
The automaker admitted it overestimated AI capabilities in quality control and had to rehire 350 engineers. The story reignites the debate on...
Microsoft built supercomputer to help OpenAI infringe copyrights, NYT alleges in amended complaint
The New York Times amends its complaint, alleging Microsoft built a bespoke supercomputer to enable OpenAI’s copyright infringement. The filing...
Zettabyte calls for a quality standard in AI compute as demand surges
Zettabyte urges a new standard for AI compute quality amid a two-year surge in demand, as organizations struggle to compare on-prem and cloud...
DrayTek's revenue slide drags into 2026, company bets on Wi-Fi 7 and cybersecurity
The Taiwanese networking gear maker faces a revenue decline extending into 2026. It is betting on Wi-Fi 7 and cybersecurity to turn things around,...
Intel readies HDR support for DP MST configurations on Linux
The Intel Linux kernel graphics driver is set to fix a gap: the inability to use HDR over DisplayPort Multi-Stream Transport connections. This...
OpenAI limits GPT-5.6 rollout after government request, says restrictions shouldn't be the norm
OpenAI restricted the rollout of GPT-5.6 following a government request, sparking debate on digital sovereignty and LLM access. The move puts a...
OpenAI hires ex-Uber India chief to lead expansion in its largest non-U.S. market
The hire strengthens OpenAI’s push into India, a crucial market for scale and opportunity. Bringing in a veteran with deep local experience...
On-prem LLMs: the workflow you wish you had discovered sooner
A Reddit thread asks which local AI workflow made the biggest difference. The answers reveal that the real value lies not in models but in...
Looking for general AI news?
< AI-RADAR MAIN