Topic / Trend Rising

Local LLMs and On-Device AI

This trend focuses on the development and use of large language models (LLMs) that can run locally on consumer hardware, enabling privacy and reducing reliance on cloud services. It also covers the challenges and optimizations required to run these models efficiently on devices with limited resources.

Detected: 2026-01-25 · Updated: 2026-01-25

Related Coverage

2026-01-24 LocalLLaMA

LLM: Which local model on 24GB GPU in 2026?

A LocalLLaMA user is wondering about the evolution of large language models (LLMs) that can be run locally. Specifically, he asks if, nine months after the release of Gemma 3 27b, there are better alternatives available that can run on a single 3090t...

#Hardware
2026-01-24 TechCrunch AI

Tech CEOs boast and bicker about AI at Davos

This week's World Economic Forum meeting saw tech leaders hotly debating artificial intelligence. The event transformed, at times, into a high-powered tech conference, with CEOs clashing over future visions and strategies.

2026-01-24 LocalLLaMA

Local LLM Development: A Challenge for Hardware Coders?

A hardware coder has expressed frustration with the performance of large language models (LLMs) running locally on a 5090 GPU. Despite the powerful hardware, the models seem underutilized and unable to leverage external tools to improve context. The ...

#Hardware #LLM On-Premise
2026-01-24 LocalLLaMA

Context Engine: Self-Hosted Code Search for LLMs

A developer has created Context Engine, a self-hosted retrieval system for codebases, designed to work with various MCP clients. It uses a hybrid search that combines dense embeddings with lexical search and AST parsing. The goal is to avoid overload...

#LLM On-Premise #DevOps #RAG
2026-01-24 LocalLLaMA

Strix Halo: MiniMax Q3 K_XL Runs Surprisingly Fast

A user tested Strix Halo (Bosgame M5 with 128GB) on Ubuntu 25.10, achieving remarkable results with the MiniMax Q3 K_XL model. Specifically, the speed of approximately 30 tokens per second in TG mode makes the model usable for brainstorming and discu...

2026-01-23 Tom's Hardware

Alibaba plans T-Head chip-arm IPO to boost AI infrastructure

Alibaba is reportedly preparing an IPO for its chip manufacturing arm, T-Head. The primary goal is to raise significant capital to fund the development of AI accelerator solutions and support ambitious infrastructure projects. T-Head would compete wi...

2026-01-22 LocalLLaMA

Kimi-Linear-48B: GGUF Support and llama.cpp Integration

The implementation of Kimi-Linear-48B in llama.cpp is being discussed online, given its effectiveness in handling long contexts. The community is wondering about the timeline for the model's integration, which promises significant performance improve...

#Hardware #LLM On-Premise
2026-01-21 Phoronix

AMD Sends Out Linux Patches For Next-Gen EPYC Features

AMD has sent out a set of 19 patches to the Linux kernel mailing list, preparing for new CPU features expected in the next-generation EPYC "Venice" processors. These patches suggest enhancements in memory management and security, indicating a focus o...

#Hardware
2026-01-21 LocalLLaMA

Fine-tuned Qwen3-14B on DeepSeek Traces: +20% Security Boost

A researcher fine-tuned the Qwen3-14B language model using 10,000 DeepSeek traces, achieving a 20% performance increase on a custom security benchmark. This demonstrates how fine-tuning smaller models with specific datasets can be a viable and more c...

2026-01-21 LocalLLaMA

File Brain: Open-Source Local Semantic Search for Your Documents

File Brain is an open-source search engine that indexes local files and allows searching using natural language. It supports multilingual semantic search, built-in OCR, and is available for Windows and Linux. The goal is to overcome the limitations o...

2026-01-21 Tom's Hardware

OpenAI commits to AI data centers with no impact on energy bills

OpenAI is committed to ensuring that electricity prices do not increase in the communities where it builds its Stargate data centers. The company will fund grid upgrades and flexible load management systems to reduce stress on the energy supply. The ...

2026-01-20 LocalLLaMA

LocalLLaMA: The unstoppable rise of local language models

A Reddit post highlights the surprising capabilities of language models running locally with LocalLLaMA. The discussion emphasizes how these models, while running on consumer hardware, demonstrate a context understanding and responsiveness that often...

#Hardware
2026-01-19 LocalLLaMA

GLM-4.7 flash: how to run it with llama.cpp?

A user inquires about the possibility of running the new GLM 4.7 flash model with llama.cpp or similar tools. The question was posted on a forum dedicated to local language models (LocalLLaMA), awaiting responses from the community of developers and ...

#Hardware #LLM On-Premise
2026-01-19 TechCrunch AI

US AI startups raise record funding in 2025

2024 was a pivotal year for the AI industry in the US and beyond. It remains to be seen whether 2025 will be equally positive. Analysis reveals that numerous AI startups have raised over $100 million in funding, marking an unprecedented wave of inves...

2026-01-19 LocalLLaMA

Nvidia GB10 vs GH200: early performance benchmarks

Early benchmarks comparing the performance of Nvidia's GB10 GPU with the GH200 have surfaced online. The data, originating from a Reddit source, offers a preview of the potential of Nvidia's new architecture, although they should be taken with cautio...

#Hardware
2026-01-19 LocalLLaMA

Z-AI (GLM): Devs Woke Up And Chose Violence

Z-AI (GLM) developers have reportedly adopted an 'aggressive' development strategy. A Reddit post highlights this choice, suggesting direct competition with other teams, particularly those at Qwen. The online discussion focuses on the implications of...

2026-01-19 LocalLLaMA

GLM 4.7 Flash Released: Massive Benchmark Gains?

GLM 4.7 Flash has been released. The open-source community is questioning the potential performance gains compared to Qwen 30b, with a focus on benchmarks. Currently, there is no objective data to support this.

#Fine-Tuning
2026-01-19 LocalLLaMA

GLM-4.7-Flash: New Open-Source Language Model on Hugging Face

The GLM-4.7-Flash language model is now available on Hugging Face. The news was shared on Reddit, sparking discussion within the LocalLLaMA community. The open-source model promises new opportunities for developing generative artificial intelligence ...

2026-01-19 LocalLLaMA

On-device browser agent with Qwen: local demo on Chrome

A new demo showcases a local browser agent, powered by Web GPU Liquid LFM and Alibaba's Qwen models, running as a Chrome extension. The agent opens 'All in Podcast' on YouTube. The source code is available on GitHub for those interested in exploring ...

#Hardware
2026-01-19 LocalLLaMA

Top-K: Optimized Algorithm Up to 20x Faster Than PyTorch

A developer has created an optimized Top-K implementation, crucial for sampling in large language models (LLM). The AVX2-optimized implementation outperforms PyTorch CPU performance by 4-20x, depending on vocabulary size. Integration into llama.cpp r...

#Hardware #LLM On-Premise
2026-01-19 LocalLLaMA

Free GPU Credits to Test LLM Training Platform

A small team is offering free compute credits for its GPU platform, in exchange for usage feedback. Available GPUs include RTX 5090 and Pro 6000, suitable for LLM inference, fine-tuning, or other machine learning workloads.

#Hardware #Fine-Tuning
2026-01-19 The Register AI

Open source's new mission: Rebuild a continent's tech stack

Europe, known for its tightly regulated tech sector, could find in open source a way to rebuild and strengthen its technological infrastructure. The adoption of open solutions could foster innovation and reduce dependence on external suppliers, promo...

2026-01-19 DigiTimes

TSMC eyes rapid 2nm growth in 2026

Taiwanese giant TSMC anticipates strong expansion of its 2nm production starting in 2026, backed by substantial investments and the expansion of its manufacturing capabilities in both Taiwan and the United States. This strategic move aims to solidify...

2026-01-19 LocalLLaMA

Local LLM Coding: Is it Still Worth it with a 16GB GPU?

A user with a 16GB Nvidia RTX 5070 Ti GPU questions the effectiveness of local large language model (LLM) development. Experience with Kilo code and Qwen 2.5 coder 7B via Ollama revealed issues with context management, which quickly runs out even wit...

#Hardware #LLM On-Premise
2026-01-19 Wired AI

The Race to Build the DeepSeek of Europe Is On

As Europe’s longstanding alliance with the US falters, its push to become a self-sufficient AI superpower has become more urgent. The goal is to create a European alternative to advanced models like DeepSeek, reducing technological dependence on othe...

2026-01-19 The Register AI

Hiring Stalls at India’s Big Four Outsourcers Amid AI Impact

India’s big four outsourcers – HCL, Infosys, TCS and Wipro – have essentially stopped hiring, potentially due to increased AI adoption. Revenue growth is also sluggish. This slowdown reflects a significant shift in the IT services landscape.

2026-01-19 ArXiv cs.CL

Conversational Agents: Does Conciseness Reduce Expertise?

A new study analyzes the unexpected side effects of using specific stylistic features in prompts for conversational agents based on large language models (LLMs). The research reveals how prompting for conciseness can compromise the perceived expertis...

#Fine-Tuning
2026-01-19 ArXiv cs.AI

LLMs: How Do They Assess Trustworthiness of Online Information?

Large language models (LLMs) are increasingly important in online search and recommendation systems. New research analyzes how these models encode perceived trustworthiness in web narratives, revealing that models internalize psychologically grounded...

#Fine-Tuning
2026-01-18 DigiTimes

Advantest ATE lead times remain tight

Lead times for Advantest's automated test equipment (ATE) remain tight due to strong demand in the AI and memory markets. This situation reflects the growth of these sectors and the pressure on the semiconductor supply chain. Advantest's ability to m...

← Back to All Topics