Local LLMs and On-Device AI

2026-01-24 • LocalLLaMA

LLM: Which local model on 24GB GPU in 2026?

A LocalLLaMA user is wondering about the evolution of large language models (LLMs) that can be run locally. Specifically, he asks if, nine months after the release of Gemma 3 27b, there are better alternatives available that can run on a single 3090t...

#Hardware

2026-01-24 • TechCrunch AI

Tech CEOs boast and bicker about AI at Davos

This week's World Economic Forum meeting saw tech leaders hotly debating artificial intelligence. The event transformed, at times, into a high-powered tech conference, with CEOs clashing over future visions and strategies.

2026-01-24 • LocalLLaMA

Local LLM Development: A Challenge for Hardware Coders?

A hardware coder has expressed frustration with the performance of large language models (LLMs) running locally on a 5090 GPU. Despite the powerful hardware, the models seem underutilized and unable to leverage external tools to improve context. The ...

#Hardware #LLM On-Premise

2026-01-24 • Phoronix

GNOME's AI Assistant Newelle Adds Llama.cpp Support, Command Execution Tool

Newelle, a virtual AI assistant for the GNOME desktop with API integration for Google Gemini, OpenAI, Groq, and also local LLMs, has a new release. Newelle has been steadily expanding its AI integration and capabilities, and with the new Newelle 1.2,...

#LLM On-Premise

2026-01-24 • LocalLLaMA

Running MoE Models on CPU/RAM: A Guide to Optimizing Bandwidth for GLM-4 and GPT-OSS

Running Mixture-of-Experts (MoE) models on CPU and RAM requires bandwidth optimization. The article analyzes GLM-4.7-Flash and GPT OSS 120B, providing hardware (Intel) and software advice, including compiling `llama.cpp` and assigning CPU cores to ma...

#Hardware #LLM On-Premise

2026-01-24 • LocalLLaMA

Context Engine: Self-Hosted Code Search for LLMs

A developer has created Context Engine, a self-hosted retrieval system for codebases, designed to work with various MCP clients. It uses a hybrid search that combines dense embeddings with lexical search and AST parsing. The goal is to avoid overload...

#LLM On-Premise #DevOps #RAG

2026-01-24 • LocalLLaMA

Strix Halo: MiniMax Q3 K_XL Runs Surprisingly Fast

A user tested Strix Halo (Bosgame M5 with 128GB) on Ubuntu 25.10, achieving remarkable results with the MiniMax Q3 K_XL model. Specifically, the speed of approximately 30 tokens per second in TG mode makes the model usable for brainstorming and discu...

2026-01-23 • Tom's Hardware

Alibaba plans T-Head chip-arm IPO to boost AI infrastructure

Alibaba is reportedly preparing an IPO for its chip manufacturing arm, T-Head. The primary goal is to raise significant capital to fund the development of AI accelerator solutions and support ambitious infrastructure projects. T-Head would compete wi...

2026-01-23 • DigiTimes

China pushes space-based AI from concept to deployment, industry group says

A Chinese industry group reports that the country is rapidly moving from conceptualization to deployment of space-based artificial intelligence systems. This strategic move could have significant implications for China's technological and military ca...

2026-01-22 • LocalLLaMA

Kimi-Linear-48B: GGUF Support and llama.cpp Integration

The implementation of Kimi-Linear-48B in llama.cpp is being discussed online, given its effectiveness in handling long contexts. The community is wondering about the timeline for the model's integration, which promises significant performance improve...

#Hardware #LLM On-Premise

2026-01-22 • DigiTimes

QCT eyes AI supply chain integration battle with full-spectrum server strategy

QCT aims to strengthen its position in the artificial intelligence supply chain. The company is reportedly developing a full-spectrum server strategy to compete in the market, vertically integrating its solutions. The goal is to offer a more comprehe...

#Hardware

2026-01-21 • Phoronix

AMD Sends Out Linux Patches For Next-Gen EPYC Features

AMD has sent out a set of 19 patches to the Linux kernel mailing list, preparing for new CPU features expected in the next-generation EPYC "Venice" processors. These patches suggest enhancements in memory management and security, indicating a focus o...

#Hardware

2026-01-21 • LocalLLaMA

Fine-tuned Qwen3-14B on DeepSeek Traces: +20% Security Boost

A researcher fine-tuned the Qwen3-14B language model using 10,000 DeepSeek traces, achieving a 20% performance increase on a custom security benchmark. This demonstrates how fine-tuning smaller models with specific datasets can be a viable and more c...

2026-01-21 • The Register AI

Trump promises nuclear datacenter permits in 3 weeks

Donald Trump promised to expedite permits for nuclear-powered data centers. Jensen Huang, CEO of Nvidia, presented his vision of AI at Davos.

#Hardware

2026-01-21 • Phoronix

NVIDIA GB10 CPU Performance Challenged AMD Ryzen AI Max+ in Linux Tests

The NVIDIA GB10 superchip, designed for AI, has been tested in traditional Linux scenarios to evaluate its CPU performance. Phoronix benchmarks compare the GB10 with the AMD Ryzen AI Max+ "Strix Halo" within the Framework Desktop, offering a glimpse ...

#Hardware

2026-01-21 • LocalLLaMA

File Brain: Open-Source Local Semantic Search for Your Documents

File Brain is an open-source search engine that indexes local files and allows searching using natural language. It supports multilingual semantic search, built-in OCR, and is available for Windows and Linux. The goal is to overcome the limitations o...

2026-01-21 • Tom's Hardware

OpenAI commits to AI data centers with no impact on energy bills

OpenAI is committed to ensuring that electricity prices do not increase in the communities where it builds its Stargate data centers. The company will fund grid upgrades and flexible load management systems to reduce stress on the energy supply. The ...

2026-01-20 • LocalLLaMA

LocalLLaMA: The unstoppable rise of local language models

A Reddit post highlights the surprising capabilities of language models running locally with LocalLLaMA. The discussion emphasizes how these models, while running on consumer hardware, demonstrate a context understanding and responsiveness that often...

#Hardware

2026-01-20 • DigiTimes

Foxconn lighthouse factory brings sustainable manufacturing and GenAI-driven decarbonization to Vietnam

Foxconn has inaugurated a state-of-the-art factory in Vietnam, a model of sustainable manufacturing. The plant integrates generative artificial intelligence (GenAI) solutions for the decarbonization of production processes. This initiative underscore...

2026-01-19 • LocalLLaMA

GLM-4.7 flash: how to run it with llama.cpp?

A user inquires about the possibility of running the new GLM 4.7 flash model with llama.cpp or similar tools. The question was posted on a forum dedicated to local language models (LocalLLaMA), awaiting responses from the community of developers and ...

#Hardware #LLM On-Premise

2026-01-19 • TechCrunch AI

US AI startups raise record funding in 2025

2024 was a pivotal year for the AI industry in the US and beyond. It remains to be seen whether 2025 will be equally positive. Analysis reveals that numerous AI startups have raised over $100 million in funding, marking an unprecedented wave of inves...

2026-01-19 • LocalLLaMA

Nvidia GB10 vs GH200: early performance benchmarks

Early benchmarks comparing the performance of Nvidia's GB10 GPU with the GH200 have surfaced online. The data, originating from a Reddit source, offers a preview of the potential of Nvidia's new architecture, although they should be taken with cautio...

#Hardware

2026-01-19 • LocalLLaMA

Z-AI (GLM): Devs Woke Up And Chose Violence

Z-AI (GLM) developers have reportedly adopted an 'aggressive' development strategy. A Reddit post highlights this choice, suggesting direct competition with other teams, particularly those at Qwen. The online discussion focuses on the implications of...

2026-01-19 • TechCrunch AI

Is the Metaverse Doomed? VR Overshadowed by Artificial Intelligence

The metaverse appears to be declining, with virtual reality giving way to artificial intelligence. Meta's ambitions in the VR sector are taking a hit. The future of the metaverse is uncertain, with new challenges and competitors on the horizon.

2026-01-19 • LocalLLaMA

GLM 4.7 Flash Released: Massive Benchmark Gains?

GLM 4.7 Flash has been released. The open-source community is questioning the potential performance gains compared to Qwen 30b, with a focus on benchmarks. Currently, there is no objective data to support this.

#Fine-Tuning

2026-01-19 • LocalLLaMA

GLM-4.7-Flash: New Open-Source Language Model on Hugging Face

The GLM-4.7-Flash language model is now available on Hugging Face. The news was shared on Reddit, sparking discussion within the LocalLLaMA community. The open-source model promises new opportunities for developing generative artificial intelligence ...

2026-01-19 • LocalLLaMA

On-device browser agent with Qwen: local demo on Chrome

A new demo showcases a local browser agent, powered by Web GPU Liquid LFM and Alibaba's Qwen models, running as a Chrome extension. The agent opens 'All in Podcast' on YouTube. The source code is available on GitHub for those interested in exploring ...

#Hardware

2026-01-19 • LocalLLaMA

Top-K: Optimized Algorithm Up to 20x Faster Than PyTorch

A developer has created an optimized Top-K implementation, crucial for sampling in large language models (LLM). The AVX2-optimized implementation outperforms PyTorch CPU performance by 4-20x, depending on vocabulary size. Integration into llama.cpp r...

#Hardware #LLM On-Premise

2026-01-19 • LocalLLaMA

Free GPU Credits to Test LLM Training Platform

A small team is offering free compute credits for its GPU platform, in exchange for usage feedback. Available GPUs include RTX 5090 and Pro 6000, suitable for LLM inference, fine-tuning, or other machine learning workloads.

#Hardware #Fine-Tuning

2026-01-19 • The Register AI

Open source's new mission: Rebuild a continent's tech stack

Europe, known for its tightly regulated tech sector, could find in open source a way to rebuild and strengthen its technological infrastructure. The adoption of open solutions could foster innovation and reduce dependence on external suppliers, promo...

2026-01-19 • DigiTimes

Tesla accelerates AI chip development even with safety and software challenges

Tesla is accelerating its efforts in AI chip development. This move comes at a crucial time as the company faces significant challenges related to the safety and software of its vehicles. The goal is to improve self-driving capabilities and other adv...

2026-01-19 • DigiTimes

TSMC eyes rapid 2nm growth in 2026

Taiwanese giant TSMC anticipates strong expansion of its 2nm production starting in 2026, backed by substantial investments and the expansion of its manufacturing capabilities in both Taiwan and the United States. This strategic move aims to solidify...

2026-01-19 • DigiTimes

Tariffs reshuffle global supply chains, but US manufacturing revival remains elusive

Tariffs are reshaping global supply chains, but the revival of the US manufacturing sector remains a difficult goal to achieve. Despite efforts and protectionist policies, American industry is struggling to regain ground and compete effectively in th...

2026-01-19 • LocalLLaMA

Local LLM Coding: Is it Still Worth it with a 16GB GPU?

A user with a 16GB Nvidia RTX 5070 Ti GPU questions the effectiveness of local large language model (LLM) development. Experience with Kilo code and Qwen 2.5 coder 7B via Ollama revealed issues with context management, which quickly runs out even wit...

#Hardware #LLM On-Premise

2026-01-19 • Wired AI

The Race to Build the DeepSeek of Europe Is On

As Europe’s longstanding alliance with the US falters, its push to become a self-sufficient AI superpower has become more urgent. The goal is to create a European alternative to advanced models like DeepSeek, reducing technological dependence on othe...

2026-01-19 • DigiTimes

Global power grids emerge as strategic choke points in AI and industrial competition race

Global power grids are emerging as crucial strategic points in the competition between artificial intelligence and industrial development. The increasing demand for energy to power data centers and digital infrastructure makes the stability and secur...

2026-01-19 • DigiTimes

US-Taiwan trade pact clears path for tech supply chain hubs in America

A new trade agreement between the United States and Taiwan could foster the creation of tech supply chain hubs in America. The initiative aims to strengthen supply chain resilience and reduce dependence on foreign suppliers, amid growing global compe...

2026-01-19 • The Register AI

Hiring Stalls at India’s Big Four Outsourcers Amid AI Impact

India’s big four outsourcers – HCL, Infosys, TCS and Wipro – have essentially stopped hiring, potentially due to increased AI adoption. Revenue growth is also sluggish. This slowdown reflects a significant shift in the IT services landscape.

2026-01-19 • ArXiv cs.CL

Conversational Agents: Does Conciseness Reduce Expertise?

A new study analyzes the unexpected side effects of using specific stylistic features in prompts for conversational agents based on large language models (LLMs). The research reveals how prompting for conciseness can compromise the perceived expertis...

#Fine-Tuning

2026-01-19 • ArXiv cs.LG

Multi-Source Transfer Learning: New Framework Optimizes Source Weights

A new study introduces UOWQ, a theoretical framework for multi-source transfer learning. UOWQ jointly optimizes source weights and transfer quantities, addressing the issue of negative transfer. The analysis demonstrates that using all available sour...

2026-01-19 • ArXiv cs.AI

LLMs: How Do They Assess Trustworthiness of Online Information?

Large language models (LLMs) are increasingly important in online search and recommendation systems. New research analyzes how these models encode perceived trustworthiness in web narratives, revealing that models internalize psychologically grounded...

#Fine-Tuning

2026-01-19 • DigiTimes

US and Taiwan finalize tariff deal, securing favorable terms for semiconductor exports

The United States and Taiwan have finalized a tariff agreement that will secure favorable terms for semiconductor exports. The deal aims to strengthen economic and technological cooperation between the two nations in the strategic semiconductor secto...

2026-01-19 • DigiTimes

TSMC drives tariff talks as Taiwan eyes 40% chip capacity shift to US

According to Digitimes, TSMC is influencing tariff discussions as Taiwan considers shifting up to 40% of its chip manufacturing capacity to the United States. This strategic move could have significant implications for the global semiconductor indust...

2026-01-19 • DigiTimes

OpenAI taps Cerebras for US$10 billion AI chip buildout

OpenAI has tapped Cerebras for a US$10 billion AI chip buildout. The collaboration aims to enhance the computing capabilities required for large language models (LLMs).

#Hardware

2026-01-18 • DigiTimes

Taiwan PCB industry gearing up for record investment driven by AI cloud computing

Taiwan's printed circuit board (PCB) industry is gearing up for record investments, driven by the increasing demand for cloud computing and artificial intelligence solutions. This influx of capital is expected to further strengthen Taiwan's position ...

2026-01-18 • DigiTimes

Advantest ATE lead times remain tight

Lead times for Advantest's automated test equipment (ATE) remain tight due to strong demand in the AI and memory markets. This situation reflects the growth of these sectors and the pressure on the semiconductor supply chain. Advantest's ability to m...

2026-01-18 • The Next Web

EU: Open digital ecosystems, more control over DMA and DSA, patent boom

The European Commission aims for open and interoperable digital ecosystems, marking a shift in tech regulation. In 2026, a stricter enforcement phase of the Digital Markets Act (DMA) and the Digital Services Act (DSA) will come into effect, with the ...

Local LLMs and On-Device AI

Related Coverage