Open Source AI Development

2026-02-07 • LocalLLaMA

Kimi-Linear-48B-A3B & Step3.5-Flash are ready - llama.cpp

Releases of Kimi-Linear-48B-A3B and Step3.5-Flash compatible with llama.cpp are now available. Official GGUF files are not yet available, but the community is already working on their creation. The availability of these models expands options for loc...

#Hardware #LLM On-Premise #DevOps

2026-02-07 • LocalLLaMA

Open-sourced exact attention kernel: 1M tokens in 1GB VRAM

Geodesic Attention Engine (GAE) is an open-source kernel that promises to drastically reduce memory consumption for large language models. With GAE, it's possible to handle 1 million tokens with only 1GB of VRAM, achieving significant energy savings ...

#Hardware #LLM On-Premise #DevOps

2026-02-07 • Phoronix

Mesa 25.3.5: Vulkan Driver Fixes & Minor Changes

Mesa 25.3.5 is now available, including fixes for the Vulkan driver and other minor improvements. This release is the latest stable version before the upcoming Mesa 26.0.

#Hardware #LLM On-Premise #DevOps

2026-02-07 • ArXiv cs.AI

Artificial Intelligence as 'Strange Intelligence': Against Linear Models

A new study challenges the linear model of AI progress, introducing the concepts of 'familiar intelligence' and 'strange intelligence'. AI systems may combine superhuman capabilities with surprising errors, defying expectations and making their evalu...

#LLM On-Premise #DevOps

2026-02-07 • LocalLLaMA

Nemo 30B: LLM with 1M Token Context Window on a Single RTX 3090

A user tested the Nemo 30B language model, achieving a context window of over 1 million tokens on a single RTX 3090 GPU. The user reported a speed of 35 tokens per second, sufficient to summarize books or research papers in minutes. The model was com...

#Hardware #LLM On-Premise #DevOps

2026-02-06 • LocalLLaMA

Experimental Model with Subquadratic Attention: Up to 10M Context Length

A 30B experimental model with subquadratic attention mechanism has been released, scaling at O(L^(3/2)). It enables handling contexts up to 10 million tokens on a single GPU, maintaining practical decoding speeds. Includes an OpenAI-compatible server...

#Hardware #LLM On-Premise #DevOps

2026-02-06 • LocalLLaMA

Hugging Face: Community-Driven LLM Benchmark Repositories

Hugging Face introduces benchmark repositories for community-driven LLM evaluations. The initiative aims to address inconsistencies in benchmark results, allowing users to contribute evaluations and directly link models to leaderboards. Verified resu...

#LLM On-Premise #DevOps

2026-02-06 • Tom's Hardware

Lucky scavenger finds $1,300 worth of SSDs for just $210 at Walmart

A lucky shopper found an incredible deal at Walmart, purchasing SSDs worth $1,300 for just $210. The haul included WD, Samsung, and PNY drives, offering significant savings on high-performance storage.

#Hardware #LLM On-Premise

2026-02-06 • Phoronix

Pushing The Intel Panther Lake CPU Performance Further On Linux

New Linux benchmarks examine the performance of Intel's Panther Lake Core Ultra X7 358H CPU with a higher power budget. The tests reveal significant generational improvements, particularly in energy efficiency, and confirm the excellent performance o...

#Hardware #LLM On-Premise #DevOps

2026-02-06 • Phoronix

AMD Prepares the Ground for RDNA 4 GPUs with GFX1170 Target

AMD continues the development of its LLVM compiler stack for future GPUs. A new target, GFX1170, also identified as RDNA 4m, has been introduced. This update adds to the ongoing work on GFX1250 and GFX13 targets, expanding support for AMD's upcoming ...

#Hardware

2026-02-06 • LocalLLaMA

llama.cpp integrates Kimi-Linear support: improved performance

The llama.cpp library has integrated support for Kimi-Linear, a technique that promises to improve the performance of language models. The integration was made possible by a pull request on GitHub, opening new possibilities for efficient inference.

#Hardware #LLM On-Premise #DevOps

2026-02-06 • Phoronix

Linux: Dynamic CPU Management for Cloud and High-Frequency Trading

A new patch series for Dynamic Housekeeping and Enhanced Isolation (DHEI) has been proposed for Linux. The goal is to enable dynamic re-partitioning of CPU resources without downtime, benefiting cloud-native orchestrators and high-frequency trading p...

#LLM On-Premise #DevOps

2026-02-06 • Phoronix

Qualcomm QUPv3 Firmware Upstreamed For Snapdragon X1 Elite Linux Users

Qualcomm is making it easier to use Snapdragon X1 Elite on Linux. Previously, necessary firmware files had to be fetched from the Windows 11 on ARM partition. Now, QUPv3 firmware bits have been integrated into the linux-firmware.git repository, great...

2026-02-06 • LocalLLaMA

LLM at 10 tokens/s on an 8th Gen i3: It Can Be Done!

A user demonstrates how to run a 16 billion parameter LLM on a 2018 HP ProBook laptop with an 8th generation Intel i3 processor and 16GB of RAM. By optimizing the use of the iGPU and leveraging MoE models, surprising inference speeds are achieved, op...

#Hardware #LLM On-Premise #DevOps

2026-02-06 • LocalLLaMA

LLM Inference: DeepSpeed Optimization and Performance

A user shares an image related to optimizing the inference of large language models (LLM) using DeepSpeed. The image suggests an analysis of performance and configurations to improve the speed and efficiency in running these models.

#Hardware

2026-02-06 • ArXiv cs.CL

BioACE: An Automated Framework for Biomedical Answer and Citation Evaluations

BioACE is a new automated framework for evaluating the quality of answers generated by large language models (LLMs) in the biomedical field. The system verifies the correctness of answers and citations, assessing completeness, precision, and accuracy...

#RAG

2026-02-06 • ArXiv cs.LG

Denoising Diffusion Networks for Normative Modeling in Neuroimaging

A new study explores the use of denoising diffusion models to estimate reference distributions in neuroimaging, enabling the derivation of clinically interpretable deviation scores. The models, based on different architectures, were evaluated on synt...

2026-02-06 • LocalLLaMA

Qwen3-235B: User Praises Local Performance

A user shared their positive experience with the Qwen3-235B language model, running it on a desktop system. The user highlighted the model's accuracy and utility, to the point of preferring it over a commercial ChatGPT subscription.

#LLM On-Premise #DevOps

2026-02-06 • LocalLLaMA

Qwen3-Coder: improved performance on RTX 5090 with llama.cpp

A user reported a significant throughput increase, up to 26 tokens/second, using the Qwen3-Coder-Next-Q4_K_S model with llama.cpp on an RTX 5090. The optimization was achieved by offloading MoE expert tensors to the CPU and quantizing the KV cache.

#Hardware #LLM On-Premise

2026-02-06 • LocalLLaMA

Tensor Parallelism in Llama.cpp: A Promising Update

A pull request introduces tensor parallelism in Llama.cpp, paving the way for faster and more efficient inference on large language models. The community welcomes this development, which could significantly improve performance on distributed hardware...

#Hardware #LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

Gemma 4: Is Google still developing the language model?

The LocalLLaMA community is questioning the future of Gemma 4, wondering if Google is still investing in the development of the language model. Despite progress in the sector, the fate of Gemma 4 remains uncertain.

#LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

SoproTTS v1.5: Zero-Shot Voice Cloning TTS for ~$100

SoproTTS v1.5 is a 135M parameter TTS (text-to-speech) model offering zero-shot voice cloning. Trained for approximately $100 on a single GPU, the model achieves around 20x real-time speed on a base MacBook M3 CPU. The new v1.5 version offers reduced...

#Hardware #LLM On-Premise #DevOps

2026-02-05 • Ars Technica AI

OpenAI: GPT-5.3-Codex Extends Capabilities Beyond Just Writing Code

OpenAI has announced GPT-5.3-Codex, a new version of its advanced coding model, accessible via command line, IDE extension, web interface, and a new macOS desktop app. This model outperforms previous versions in benchmarks like SWE-Bench Pro and Term...

#LLM On-Premise #DevOps

2026-02-05 • OpenAI Blog

GPT-5 lowers the cost of cell-free protein synthesis

An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation. This automated approach promises to accelerate biological research and reduce developm...

#LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

New OCR Models: LightOnOCR-2 and GLM-OCR Improve Accuracy

LightOnOCR-2 and GLM-OCR, two new models for optical character recognition (OCR), have been released. A user reported superior performance compared to solutions available in late 2025, with GLM-OCR offering speed and reliable structured output.

2026-02-05 • Phoronix

Intel Battlemage GPUs: D3cold Support Re-enabled with Linux 7.0 (Partially)

Intel's Xe graphics driver for Linux, starting with kernel 7.0, will re-enable D3cold support for Battlemage GPUs. This feature was disabled due to instability issues in power state transitions. The change will not apply to all systems, excluding spe...

#Hardware #LLM On-Premise #DevOps

2026-02-05 • OpenAI Blog

GPT-5.3-Codex: New Model for Code Generation

GPT-5.3-Codex has been unveiled, an advanced model for code generation that combines the performance of GPT-5.2-Codex with superior reasoning and professional knowledge capabilities. The model positions itself as one of the most advanced of its kind.

#LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

DeepBrainz-R1: Small Models for Agentic Workflows Released

DeepBrainz has released DeepBrainz-R1, a family of small language models (4B, 2B, 0.6B) focused on reasoning for agentic workflows. Optimized for multi-step reasoning and stability in tool-calling, these Apache 2.0 models aim to provide predictable b...

#LLM On-Premise #DevOps

2026-02-05 • Google AI Blog

Natively Adaptive Interfaces: Google presents a framework for AI accessibility

Google introduces a new framework, called NAI (Natively Adaptive Interfaces), that leverages artificial intelligence to make technology more adaptive and inclusive. The goal is to improve the user experience for everyone, regardless of their abilitie...

#LLM On-Premise #DevOps

2026-02-05 • Phoronix

Debian Restricts CI Data Access Due to LLM Scrapers / Bot Traffic

Debian's continuous integration (CI) infrastructure has restricted public access to its data due to excessive scraping by bots used to train large language models (LLMs). The load generated by these scrapers has impacted web server resources.

#LLM On-Premise #DevOps

2026-02-05 • The Register AI

Microsoft declares 'reliability' a priority for AI in Visual Studio

Microsoft says "reliability is the priority" for AI in Visual Studio. The reassurance may raise eyebrows among developers already living with Copilot's quirks.

#LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

gWorld: 8B model beats 402B Llama 4 by generating web code

Trillion Labs and KAIST AI introduced gWorld, an open-weight visual world model for mobile GUIs. gWorld, available in 8B and 32B versions, generates executable web code instead of pixels, surpassing larger models like Llama 4 in accuracy. This approa...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-05 • LocalLLaMA

Strix Halo benchmarks: 13 LLM models, 15 llama.cpp builds

A Reddit user benchmarked the Strix Halo's iGPU, testing various software configurations with 13 LLM models and 15 different llama.cpp builds. The aim was to evaluate the impact of ROCm, Vulkan, and various compilation options on inference performanc...

#Hardware #LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

Unofficial ik_llama.cpp release builds available for macOS, Ubuntu and Windows

Unofficial pre-built ik_llama.cpp builds are now available for macOS, Ubuntu, and Windows. These builds simplify project adoption by removing the need for manual compilation. The creator still encourages compiling from the original source code when p...

#LLM On-Premise #DevOps

2026-02-05 • Tom's Hardware

Nvidia DLSS 4.5: Ray Reconstruction without Denoisers?

Nvidia is reportedly developing DLSS 4.5, an advanced version of its upscaling technology that could eliminate the need for denoisers in ray tracing. This is thanks to a Transformer model that reconstructs ray-traced reflections more accurately.

#Hardware

2026-02-05 • Phoronix

Intel Arc B390 Graphics Performance On Linux With Panther Lake

First Linux benchmarks of the Intel Arc B390 GPU, integrated in high-end Panther Lake models. The Xe3 graphics card, equipped with 12 Xe cores, promises interesting performance in desktop and mobile environments for graphics and compute workloads.

#Hardware #LLM On-Premise #DevOps

2026-02-05 • Phoronix

Ubuntu To Support The SpacemiT K3 As One Of The First RISC-V RVA23 SoCs

Canonical and SpacemiT announced that Ubuntu Linux will be officially supported on SpacemiT's new K3 RISC-V SoC. What makes the K3 interesting is being one of the first available RISC-V RVA23 designs.

2026-02-05 • LocalLLaMA

vLLM-Omni: any-to-any multimodal inference with improved efficiency

The vLLM team introduced vLLM-Omni, a system designed for any-to-any multimodal models handling text, images, video, and audio. The architecture includes stage-based graph decomposition, per-stage batching, and flexible GPU allocation, achieving up t...

#Hardware #LLM On-Premise

2026-02-05 • Phoronix

Krita 6.0 Beta Released: Qt6 & Wayland Color Management

The first beta release of Krita 6.0 is now available, a featureful digital painting program, re-based against the Qt6 toolkit. Krita 5.3 Beta is also being released for those sticking to Qt5. The update introduces improvements in color management and...

#LLM On-Premise #DevOps

2026-02-05 • Phoronix

NetBSD's Kernel Supports Lua Scripting But Don't Look For Rust In There Anytime Soon

For those not fond of the increasing use of the Rust programming language within the Linux kernel or FreeBSD's considerations for Rust in its kernel, you can perhaps find refuge within NetBSD. One of the NetBSD developers has explained why you likely...

#LLM On-Premise #DevOps

2026-02-05 • Phoronix

Intel ISPC 1.30 Released With AMX Support Added To The Standard Library

Intel ISPC 1.30 is now available, featuring AMX (Advanced Matrix Extensions) support added to the standard library. ISPC is a variant of the C programming language designed to target Intel CPUs and GPUs.

#Hardware #LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

AnyTTS: Universal Text-to-Speech for AI Chat Systems

A developer created AnyTTS, a system that allows using any text-to-speech (TTS) engine with various AI chat interfaces, including ChatGPT and local LLM models. The integration happens via the clipboard, simplifying TTS usage across platforms. Current...

#LLM On-Premise #DevOps

2026-02-05 • Tech.eu

Qontext Closes $2.7M Pre-Seed Round to Develop Context Layer for AI

Berlin-based Qontext, developing an independent context layer for AI, has secured $2.7 million in pre-seed funding. The company plans to expand its platform and team to develop reusable context infrastructure, enabling AI processes to operate on reli...

2026-02-05 • Microsoft Research

Microsoft Paza: ASR benchmarks and models for low-resource languages

Microsoft introduces Paza, a project to improve automatic speech recognition (ASR) in low-resource languages. It includes PazaBench, an ASR leaderboard for 39 African languages, and Paza ASR models, optimized for six Kenyan languages. The initiative,...

#Fine-Tuning

2026-02-05 • Phoronix

Linux 7.0: Improved Nouveau Support for Better NVK Performance

The Linux 6.19 merge window introduced support for larger pages and compression with the Nouveau kernel driver, aiming to improve the performance of open-source NVIDIA drivers. Initial issues disabled this functionality, but version 7.0 should resolv...

#Hardware #LLM On-Premise #DevOps

2026-02-05 • ArXiv cs.CL

NLP for Automated Classification of CS Curriculum Materials

A new study explores the use of Natural Language Processing (NLP), including Large Language Models (LLM), to automatically classify pedagogical materials against computer science curriculum guidelines. The goal is to accelerate and simplify the proce...

#RAG

2026-02-05 • ArXiv cs.LG

Reversible Deep Learning for 13C NMR in Chemoinformatics

A novel reversible deep learning model employs a conditional invertible neural network to link molecular structures and 13C NMR spectra. The network, built upon i-RevNet bijective blocks, enables spectrum prediction from structure and, conversely, th...

2026-02-05 • ArXiv cs.AI

LLMs: Enhanced Reasoning for Mathematical Problem Solving

A new method, Iteratively Improved Program Construction (IIPC), enhances the mathematical reasoning capabilities of large language models (LLMs). IIPC iteratively refines programmatic reasoning chains, combining execution feedback with the Chain-of-t...

2026-02-05 • ArXiv cs.AI

Knowledge Model Prompting Increases LLM Performance on Planning Tasks

A new study explores the effectiveness of the Task-Method-Knowledge (TMK) framework to enhance reasoning and planning capabilities of Large Language Models (LLMs). Results show that TMK-structured prompting can significantly increase accuracy on comp...

#LLM On-Premise #DevOps

2026-02-05 • DigiTimes

MediaTek projects strong growth in cloud ASIC market, aims for US$1 billion revenue by 2026

MediaTek projects strong growth in the cloud ASIC market, aiming for US$1 billion in revenue by 2026. The company aims to strengthen its position in this expanding sector by providing customized solutions for major cloud service providers.

#Hardware #LLM On-Premise #DevOps

2026-02-05 • LocalLLaMA

Incomplete SOTA Models: The Disappointment of Tencent's Youtu-VL-4B

A user expressed frustration with Tencent's Youtu-VL-4B model, advertised as a state-of-the-art (SOTA) solution for various computer vision tasks. Despite the promises, the released code was found to be incomplete, with key features missing and hidde...

#DevOps

2026-02-05 • LocalLLaMA

Codag: Visualize LLM Workflows in VSCode

A developer has created Codag, an open-source VSCode extension that visualizes LLM workflows directly within the development environment. It supports several frameworks such as OpenAI, Anthropic, Gemini, LangChain, LangGraph, and CrewAI, along with v...

2026-02-04 • LocalLLaMA

Claude-Code: backend replaced with NVIDIA NIM for LLM inference

A user replaced Claude-Code's backend with NVIDIA NIM models, leveraging a free API for LLM inference. The modification includes using Telegram as an interface and preserves reasoning tokens between tool calls, enhancing performance with models like ...

#Hardware #LLM On-Premise #DevOps

2026-02-04 • LocalLLaMA

Kimi K2.5: New Open-Weight Model Record on ECI

Kimi K2.5 sets a new record among open-weight models on the Epoch Capabilities Index (ECI), which combines multiple benchmarks onto a single scale. Its score of 147 is on par with models like o3, Grok 4, and Sonnet 4.5, while still lagging behind the...

#LLM On-Premise #DevOps

2026-02-04 • Phoronix

Microsoft Develops LiteBox: A Rust-Based Sandboxing Library OS

Microsoft has announced LiteBox, a sandboxing operating system developed in Rust. Designed for security, LiteBox leverages Linux Virtualization Based Security (LVBS) to isolate the guest kernel through hardware virtualization, offering a protected en...

#Hardware #LLM On-Premise #DevOps

2026-02-04 • Phoronix

Mesa 26.0-rc3 Released With More Graphics Driver Fixes

Mesa 26.0-rc3 is now available, featuring the latest bug fixes for graphics drivers. The stable Mesa 26.0 release is expected soon.

2026-02-04 • LocalLLaMA

Qwen3-Coder-Next-FP8: A New King for Code Generation?

A Reddit user reported excellent performance of the Qwen3-Coder-Next-FP8 model. The discussion focuses on its code generation capabilities, suggesting a potential improvement over existing alternatives. The original article includes a link to an imag...

#Fine-Tuning

2026-02-04 • LocalLLaMA

GPT-4o: Instructions to handle users upset about shutdown added

GPT-4o's system prompt now includes instructions for handling users upset about its upcoming shutdown, scheduled for February 13. The instructions also cover edge cases such as "dyad pair" and "gnosis revelation".

2026-02-04 • Phoronix

Intel Sends Out Initial Linux Patches For Xe3P_LPG Graphics With Nova Lake P

Intel Linux engineers are actively preparing support for next-gen Nova Lake processors. The latest developments include enabling Xe3P_LPG graphics support and related display functionality through new Linux kernel patches.

#Hardware #LLM On-Premise #DevOps

2026-02-04 • Phoronix

Mesa Will Now Prevent Compiling With LTO Due To "Random Impossible-To-Debug Bugs"

The Mesa project has decided to disable the use of Link-Time Optimization (LTO) during compilation due to bugs that are difficult to identify and fix. LTO, while offering performance benefits, introduces complexities in binary debugging.

2026-02-04 • LocalLLaMA

Mistral AI releases Voxtral Mini: Real-time multilingual speech transcription

Mistral AI introduces Voxtral Mini 4B Realtime 2602, an open-source model for real-time multilingual speech transcription. It offers accuracy comparable to offline systems with latency below 500ms, supports 13 languages, and is optimized for on-devic...

#Hardware #LLM On-Premise #DevOps

2026-02-04 • Wired AI

Mistral AI's Ultra-Fast Translation Challenges Big AI Labs

French startup Mistral AI is taking a different approach compared to large US labs, focusing on efficiency and translation speed of its models, with a focus on hardware resource optimization.

#Hardware #LLM On-Premise #DevOps

2026-02-04 • LocalLLaMA

Vectorized fix for Qwen3Next in llama.cpp

A pull request on llama.cpp introduces a fix for the `key_gdiff` vectorized calculation in the Qwen3Next model. The change, initially reported on Reddit, aims to improve the model's accuracy and efficiency within the llama.cpp project.

#LLM On-Premise #DevOps

2026-02-04 • Tom's Hardware

Bill Gates and software 'piracy': a 50-year-old open letter

In 1976, Bill Gates expressed concern about the unauthorized copying of Altair BASIC software by hobbyists. An open letter reveals the early challenges related to protecting intellectual property in the software world.

2026-02-04 • Phoronix

Intel Driver Disables Vulkan Video Encode On Newer Hardware

Intel's ANV open-source Vulkan driver has temporarily disabled Vulkan Video encode support on newer graphics hardware. The decision was made due to insufficient testing, despite Vulkan Video's growing traction as a cross-vendor, cross-platform API fo...

#Hardware #LLM On-Premise #DevOps

2026-02-04 • LocalLLaMA

Ollama under fire: a heated debate in the LocalLLaMA community

A recent thread on Reddit, within the LocalLLaMA community, has sparked a heated debate about the criticisms of Ollama, a framework for local execution of large language models (LLMs). The discussion focuses on alleged shortcomings and areas for impr...

#LLM On-Premise #DevOps

2026-02-04 • LocalLLaMA

Intern-S1-Pro: A New Large Language Model

Intern-S1-Pro, a large language model (LLM) with approximately 1 trillion parameters, has been released. It appears to be a scaled version of the Qwen3-235B model, with an architecture based on 512 experts.

#Hardware #LLM On-Premise #DevOps

2026-02-04 • LocalLLaMA

Qwen3-Coder-Next REAP: New 48B GGUF Model Released

A new 48 billion parameter Qwen3-Coder-Next REAP model has been released in GGUF format. This format facilitates the use of the model on various hardware platforms, making it accessible to a wide range of developers and researchers interested in expe...

#Hardware #LLM On-Premise #DevOps

2026-02-04 • LocalLLaMA

GPT-4o and context: the challenge of long conversations

A user on r/LocalLLaMA reports "context rot" issues with GPT-4o in long conversations (over 15 turns) in a support agent. Sliding window and summarization strategies do not solve the problem. Context management remains an open challenge in the develo...

#LLM On-Premise #DevOps

2026-02-04 • DigiTimes

Nvidia shapes the HBM4 race as Samsung, SK Hynix jockey for position

The race for HBM4 memory production intensifies, with Nvidia playing a key role in defining the specifications. Samsung and SK Hynix are vying for leadership in this sector crucial for future GPUs and AI accelerators.

#Hardware #LLM On-Premise #DevOps

2026-02-04 • Tech.eu

Spotify-backed Soundtrack acquires Tunify and Ambie to build local power

Soundtrack Technologies, a joint venture with Spotify, has acquired Tunify (Belgium and Netherlands) and Ambie (UK) to consolidate its presence in the B2B music market. The goal is to combine a global infrastructure with localized curation services f...

2026-02-04 • LocalLLaMA

Qwen3-Coder-Next: NVFP4 Quantization Released (45GB)

A quantized version of Qwen3-Coder-Next in NVFP4 format is now available, weighing 45GB. The model was calibrated using the ultrachat_200k dataset, with a 1.63% accuracy loss in the MMLU Pro+ benchmark.

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-04 • ArXiv cs.CL

STEMVerse: A Framework for Evaluating STEM Reasoning in LLMs

A new study introduces STEMVerse, a diagnostic framework to analyze the science, technology, engineering, and mathematics (STEM) reasoning capabilities of large language models (LLMs). STEMVerse aims to overcome the limitations of current benchmarks,...

#LLM On-Premise #DevOps

2026-02-04 • ArXiv cs.LG

LLMs to Augment Parameter-Efficient Fine-tuned Cybersecurity Models

A new study explores the use of large language models (LLMs) to enhance cybersecurity models. Strategies include using LLMs for data labeling and as fallback mechanisms for low-confidence predictions, combining parameter-efficient fine-tuning and pre...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-04 • ArXiv cs.LG

UNSO: Unified Newton-Schulz Orthogonalization for Stable Performance

A novel approach, called UNSO (Unified Newton-Schulz Orthogonalization), aims to address efficiency and stability issues in the Newton-Schulz iteration, used in optimizers like Muon and on the Stiefel manifold. The method consolidates the iterative s...

2026-02-04 • DigiTimes

Alphabet reportedly plans major Bangalore expansion, bolstering India's AI ambition

Alphabet is reportedly planning a major expansion of its operations in Bangalore, India. This move underscores the growing importance of India as a hub for artificial intelligence development and Alphabet's commitment to investing in this rapidly gro...

#LLM On-Premise #DevOps

2026-02-04 • LocalLLaMA

Qwen-Coder-Next running on ROCm on Strix Halo: local testing

A user reported successfully running the Qwen-Coder-Next model on a Strix Halo platform using ROCm. The test was performed with llamacpp-rocm and a context size of 16k, opening new possibilities for running large language models locally.

#Hardware #LLM On-Premise #DevOps

2026-02-03 • TechCrunch AI

Xcode moves into agentic coding with deeper OpenAI and Anthropic integrations

Xcode 26.3 introduces agentic coding capabilities, leveraging Anthropic's Claude Agent and OpenAI's Codex. The integration aims to enhance developer efficiency by automating complex programming tasks.

2026-02-03 • Anthropic News

Apple’s Xcode now supports the Claude Agent SDK

Apple’s Xcode IDE now supports the Claude Agent SDK. This integration may simplify the development of applications leveraging Claude's capabilities.

2026-02-03 • LocalLLaMA

ACE-Step-1.5: Open-Source Audio Generative Model Released

ACE-Step-1.5, an MIT-licensed open-source audio generative model, has been released. Its performance is close to commercial platforms like Suno. The model supports LoRAs and offers cover and repainting features. Hugging Face demos and ComfyUI integra...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-03 • Ars Technica AI

Xcode 26.3 adds support for Claude, Codex via Model Context Protocol

Apple has announced Xcode 26.3, a new version of its IDE that supports agentic coding tools like Codex and Claude Agent. The integration is enabled via Model Context Protocol (MCP), allowing AI agents to interact with external tools and structured re...

#LLM On-Premise #DevOps

2026-02-03 • LocalLLaMA

ACE-Step 1.5: The Open-Source Model Challenging Suno in Music Generation

ACE-Step 1.5, an open-source model for music generation, is now available. It promises to outperform Suno in quality, generating full songs in about 2 seconds on an A100 GPU and running locally on PCs with 4GB of VRAM. The code, weights, and training...

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-03 • LocalLLaMA

Qwen3-Coder-Next: New language model for programming

Qwen3-Coder-Next is available, a new language model developed for programming applications. The model is accessible via Hugging Face and related discussion is active on Reddit. This release represents a significant update in the field of language mod...

2026-02-03 • LocalLLaMA

Qwen3-Coder-Next: new language model for programming

Qwen3-Coder-Next, a language model developed for programming applications, has been released on Hugging Face. Its availability on the platform facilitates access and integration by developers. The model promises to improve efficiency in software deve...

#LLM On-Premise #DevOps

2026-02-03 • Phoronix

OpenIndiana Is Porting Solaris' IPS Package Management To Rust

OpenIndiana, the open-source project built atop Illumos that is continuing to maintain and advance the former OpenSolaris code, is working on modernizing the Image Packaging System (IPS) package management solution. As part of that, they are working ...

#LLM On-Premise #DevOps

2026-02-03 • LocalLLaMA

Defending against bots on LocalLLaMA: strategies and countermeasures

A LocalLLaMA user raises concerns about bot activity on the platform, including misleading comments and vote manipulation. The discussion focuses on the need for defense strategies to protect the community from these threats.

#LLM On-Premise #DevOps

2026-02-03 • Phoronix

Reworked NTFS Linux Driver Posted With More Improvements & Fixes

A new version of the NTFS driver for Linux is available, based on the original code and aimed at delivering superior performance and new features. The goal is to provide a more efficient alternative for those who rely on this Microsoft file system.

#LLM On-Premise #DevOps

2026-02-03 • LocalLLaMA

GLM releases open-source OCR model

GLM has released an open-source Optical Character Recognition (OCR) model. The model, named GLM-OCR, is available on Hugging Face. It appears to be composed of a 0.9 billion parameter vision model and a 0.5 billion parameter language model, suggestin...

#LLM On-Premise #DevOps

2026-02-03 • LocalLLaMA

Qwen3-TTS Studio: Voice Cloning and Local Podcast Generation

A developer has built Qwen3-TTS Studio, an interface for voice cloning and automated podcast generation. The system supports 10 languages, runs voice synthesis locally, and can be integrated with local LLMs for script generation.

#LLM On-Premise #DevOps

2026-02-03 • ArXiv cs.CL

MediGRAF: Hybrid Clinical AI for Safe Health Data Analysis

A new hybrid system, MediGRAF, combines knowledge graphs and LLMs to query patient health data. The system integrates structured and unstructured data, achieving 100% accuracy in factual answers and a high level of quality in complex inferences, with...

#Fine-Tuning #RAG

2026-02-03 • ArXiv cs.CL

PPoGA: Predictive Plan-on-Graph with Action for Knowledge Graph Question Answering

A novel framework, PPoGA, enhances the ability of Large Language Models (LLMs) to answer complex questions based on Knowledge Graphs. Inspired by human cognitive control, PPoGA introduces self-correction mechanisms to overcome the limitations of init...

#LLM On-Premise #DevOps

2026-02-03 • ArXiv cs.LG

Measurement for Opaque Systems: Multi-source Triangulation with Interpretable Machine Learning

A new measurement framework addresses the challenge of analyzing complex systems that are difficult to reach directly. The method combines indirect data from multiple sources, interpretable machine learning models, and triangulation techniques to obt...

#LLM On-Premise #DevOps

2026-02-03 • ArXiv cs.LG

OGD4All: A Framework for Accessible Interaction with Geospatial Open Government Data Based on Large Language Models

OGD4All is a framework based on Large Language Models (LLMs) to enhance citizens' interaction with geospatial Open Government Data (OGD). The system combines semantic data retrieval, agentic reasoning for iterative code generation, and secure sandbox...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-03 • ArXiv cs.AI

Complete Identification of Deep ReLU Neural Networks by Many-Valued Logic

A new study addresses the complete identification problem of ReLU neural networks, which exhibit nontrivial functional symmetries. The research translates ReLU networks into Lukasiewicz logic formulae, transforming them through algebraic rewrites gov...

2026-02-03 • ArXiv cs.AI

FastAPI and Triton Inference Server Benchmarking on Kubernetes

A new study compares FastAPI and NVIDIA Triton Inference Server for deploying machine learning models in healthcare, evaluating latency and throughput on Kubernetes. The analysis highlights the benefits of a hybrid approach to balance performance and...

#Hardware #LLM On-Premise #DevOps

2026-02-02 • Phoronix

Firefox 148 Ready With New Settings For AI Controls

The upcoming Firefox 148 release will include a new AI controls area within the browser's settings. This follows concerns raised over comments by Mozilla's new CEO about evolving Firefox into a "modern AI browser".

#LLM On-Premise #DevOps

2026-02-02 • TechCrunch AI

Firefox: Granular Control Over Generative AI Coming Soon

Firefox will introduce new settings in version 148 to control the generative AI features integrated into the browser. Users will be able to completely block these features, offering greater control over their browsing experience.

#LLM On-Premise #DevOps

2026-02-02 • Ars Technica AI

OpenAI launches Codex desktop app for macOS, challenging Claude Code

OpenAI has released a macOS desktop app for Codex, its large language model (LLM)-based coding tool. This move aims to compete with Anthropic's Claude Code, offering an alternative to command-line interfaces (CLI) and IDE extensions.

#LLM On-Premise #DevOps

2026-02-02 • OpenAI Blog

Codex: Centralized AI Development Environment for macOS

Codex is a new macOS application that acts as a command center for AI and software development. It allows managing multiple agents, parallel workflows, and long-running tasks, all within a single interface.

2026-02-02 • Tom's Hardware

Ryzen 7 9850X3D: Factory Overclock of the 9800X3D?

Binning data from 13 Ryzen 7 9850X3D samples suggests the CPU is essentially a 9800X3D with higher voltages to achieve higher clock speeds. The single-core performance of the 9850X3D appears to primarily stem from this factory overclock.

#LLM On-Premise #DevOps

2026-02-02 • DigiTimes

SMIC reportedly sets up advanced packaging research institute in Shanghai

Chinese semiconductor manufacturer SMIC has reportedly established a research institute in Shanghai focusing on the development of advanced packaging technologies. This strategic move aims to enhance production capabilities and innovation in the semi...

#Hardware #LLM On-Premise #DevOps

2026-02-02 • Tech.eu

Incard closes £10M Series A to expand its financial platform

Incard, a financial platform for high-growth digital companies, has raised £10 million in Series A funding. The company plans to expand into new markets, enhance its product offering, and invest in automation, AI-driven financial workflows, and team ...

#LLM On-Premise #DevOps

2026-02-02 • ArXiv cs.CL

MrRoPE: A Unified Approach to Extend LLM Context Window

A new study introduces MrRoPE, a generalized formulation for extending the context window of large language models (LLMs) based on a radix system conversion perspective. This approach unifies various existing strategies and introduces two training-fr...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-02 • ArXiv cs.LG

Emotion Recognition: Domain Knowledge Outperforms Transformers

A study on the EAV dataset reveals that, for multimodal emotion recognition on small datasets, complex attention mechanisms (Transformers) underperform compared to modifications based on domain knowledge. Adding delta MFCCs to the audio CNN improves ...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-02 • ArXiv cs.AI

The Six Sigma Agent: Achieving Enterprise-Grade Reliability in LLM Systems Through Consensus

A new study introduces the Six Sigma Agent, an architecture to improve the reliability of large language models (LLMs) in enterprise settings. The approach is based on task decomposition, parallel execution across diverse LLMs, and a consensus voting...

2026-02-02 • LocalLLaMA

Step-3.5-Flash: outperforms with fewer parameters

The Step-3.5-Flash model, with a reduced active parameter architecture (11B out of 196B total), demonstrates superior performance compared to DeepSeek v3.2 in coding and agent benchmarks. DeepSeek v3.2 uses an architecture with many more active param...

#Hardware #LLM On-Premise #DevOps

2026-02-01 • Phoronix

Linux 6.19: Stable Release Delayed Due to Holidays

The stable release of the Linux 6.19 kernel has been delayed by a week due to the year-end holiday period. Version 6.19-rc8 is now available, with the stable version expected next week. This delay is not due to critical bugs, but rather the need to m...

2026-02-01 • LocalLLaMA

AIDA: Pentesting platform with AI control and 400+ tools

A developer has created AIDA, an open-source pentesting platform that allows an AI agent to control over 400 security tools. The AI can execute tools, chain attacks, and document findings, all through a Docker container and a web dashboard.

#LLM On-Premise #DevOps

2026-02-01 • Phoronix

GNOME Resources 1.10 Adds Monitoring Support For AMD Ryzen AI NPUs

GNOME Resources 1.10, the newest version of this system monitoring app, introduces monitoring support for AMD Ryzen AI NPUs. This application is now used by default on distributions like the upcoming Ubuntu 26.04 LTS. The update also includes other u...

#Hardware

2026-02-01 • LocalLLaMA

OLMO 3.5: Hybrid Model for Efficient LLM Inference Coming Soon

AI2's OLMO 3.5 model combines standard transformer attention with linear attention using Gated Deltanet. This hybrid approach aims to improve efficiency and reduce memory usage while maintaining model quality. The OLMO series is fully open source, fr...

#Fine-Tuning

2026-02-01 • LocalLLaMA

Falcon-H1-Tiny: Specialized Micro-Models at 90M Parameters

TII releases Falcon-H1-Tiny, a series of sub-100M parameter models challenging the scaling dogma. These specialized models exhibit a lower tendency to hallucinate compared to larger, general-purpose models. Specialized variants offer competitive perf...

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-01 • LocalLLaMA

Uncensored LLM Models Available on Hugging Face

An overview of uncensored large language models (LLM) available on the Hugging Face platform. The list includes variants of GLM, GPT OSS, Gemma, and Qwen, with different methods of removing restrictions. The article provides direct links to the model...

#LLM On-Premise #DevOps

2026-02-01 • Phoronix

Phoronix: Linux Kernel, ReactOS Developments & AMD Ryzen 7 Topped January

A recap of the most popular news and reviews published on Phoronix in January. The focus is on Linux kernel developments, progress in the ReactOS operating system, and analysis of the AMD Ryzen 7 9850X3D CPU. Phoronix published nearly 300 articles du...

#Hardware #LLM On-Premise #DevOps

2026-02-01 • LocalLLaMA

vLLM-MLX on Apple Silicio: Up to 87% Higher Throughput

Recent research compares the performance of vLLM-MLX on Apple Silicio with llama.cpp, highlighting significantly higher throughput. The results suggest potential advantages in using Apple hardware for local inference of large language models (LLMs).

#LLM On-Premise #DevOps

2026-02-01 • LocalLLaMA

Kanade Tokenizer: real-time voice cloning on CPU

A developer has presented Kanade Tokenizer, a voice cloning tool optimized for speed, with a real-time factor exceeding RVC. It also runs on CPU. A fork with a GUI based on Gradio and Tkinter is available.

#LLM On-Premise #DevOps

2026-02-01 • LocalLLaMA

Can 4chan data REALLY improve a model? Turns out it can!

An experiment showed how training a language model on a dataset derived from 4chan led to unexpected results. The model, Assistant_Pepe_8B, outperformed NVIDIA's Nemotron base model, despite being trained on data considered to be of lower quality. Th...

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-01 • DigiTimes

LMOC ramps silicio photonics output with new MOCVD expansion plan

LandMark Optoelectronics is expanding its silicio photonics component production by adding new MOCVD reactors. The expansion aims to meet the growing demand for high-speed interconnects in data centers and artificial intelligence applications.

#LLM On-Premise #DevOps

2026-02-01 • LocalLLaMA

NanoChat: Beating GPT-2 for Under $100

Andrej Karpathy demonstrated how to surpass GPT-2's performance with a model called NanoChat, trained in just three hours on 8 H100 GPUs. The project includes details on the architecture, optimizers used, data setup, and a script for reproducing the ...

#Hardware #LLM On-Premise #DevOps

2026-02-01 • Phoronix

Linux 7.0: Per-CPU Caching with Sheaves for Improved Performance

Linux 7.0 is preparing to introduce significant improvements in per-CPU cache management, thanks to the integration of 'sheaves'. This technology, already present in optional form since version 6.18, aims to progressively replace traditional CPU slab...

2026-01-31 • LocalLLaMA

Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs

A novel approach called Scalable Power Sampling promises to improve the reasoning capabilities of large language models (LLMs) without requiring further training. The method is based on sharpening the model's distribution, achieving performance compa...

#LLM On-Premise #Fine-Tuning #DevOps

2026-01-31 • LocalLLaMA

g-HOOT: A New Research Paper in the World of AI

A new research paper, available on arXiv, called "g-HOOT in the Machine", has caught the attention of the LocalLLaMA community. The paper, identified via the provided arXiv link, promises to explore new frontiers in the field of artificial intelligen...

Open Source AI Development

Related Coverage