Topic / Trend Rising

Open Source AI Development

The open-source AI community is thriving, with new models, tools, and techniques being developed and shared. This includes efforts to improve LLM performance, extend context windows, and create specialized models for various applications.

Detected: 2026-02-07 · Updated: 2026-02-07

Related Coverage

2026-02-07 LocalLLaMA

Kimi-Linear-48B-A3B & Step3.5-Flash are ready - llama.cpp

Releases of Kimi-Linear-48B-A3B and Step3.5-Flash compatible with llama.cpp are now available. Official GGUF files are not yet available, but the community is already working on their creation. The availability of these models expands options for loc...

#Hardware #LLM On-Premise #DevOps
2026-02-07 LocalLLaMA

Open-sourced exact attention kernel: 1M tokens in 1GB VRAM

Geodesic Attention Engine (GAE) is an open-source kernel that promises to drastically reduce memory consumption for large language models. With GAE, it's possible to handle 1 million tokens with only 1GB of VRAM, achieving significant energy savings ...

#Hardware #LLM On-Premise #DevOps
2026-02-07 Phoronix

Mesa 25.3.5: Vulkan Driver Fixes & Minor Changes

Mesa 25.3.5 is now available, including fixes for the Vulkan driver and other minor improvements. This release is the latest stable version before the upcoming Mesa 26.0.

#Hardware #LLM On-Premise #DevOps
2026-02-07 ArXiv cs.AI

Artificial Intelligence as 'Strange Intelligence': Against Linear Models

A new study challenges the linear model of AI progress, introducing the concepts of 'familiar intelligence' and 'strange intelligence'. AI systems may combine superhuman capabilities with surprising errors, defying expectations and making their evalu...

#LLM On-Premise #DevOps
2026-02-07 LocalLLaMA

Nemo 30B: LLM with 1M Token Context Window on a Single RTX 3090

A user tested the Nemo 30B language model, achieving a context window of over 1 million tokens on a single RTX 3090 GPU. The user reported a speed of 35 tokens per second, sufficient to summarize books or research papers in minutes. The model was com...

#Hardware #LLM On-Premise #DevOps
2026-02-06 LocalLLaMA

Experimental Model with Subquadratic Attention: Up to 10M Context Length

A 30B experimental model with subquadratic attention mechanism has been released, scaling at O(L^(3/2)). It enables handling contexts up to 10 million tokens on a single GPU, maintaining practical decoding speeds. Includes an OpenAI-compatible server...

#Hardware #LLM On-Premise #DevOps
2026-02-06 LocalLLaMA

Hugging Face: Community-Driven LLM Benchmark Repositories

Hugging Face introduces benchmark repositories for community-driven LLM evaluations. The initiative aims to address inconsistencies in benchmark results, allowing users to contribute evaluations and directly link models to leaderboards. Verified resu...

#LLM On-Premise #DevOps
2026-02-06 Phoronix

Pushing The Intel Panther Lake CPU Performance Further On Linux

New Linux benchmarks examine the performance of Intel's Panther Lake Core Ultra X7 358H CPU with a higher power budget. The tests reveal significant generational improvements, particularly in energy efficiency, and confirm the excellent performance o...

#Hardware #LLM On-Premise #DevOps
2026-02-06 Phoronix

AMD Prepares the Ground for RDNA 4 GPUs with GFX1170 Target

AMD continues the development of its LLVM compiler stack for future GPUs. A new target, GFX1170, also identified as RDNA 4m, has been introduced. This update adds to the ongoing work on GFX1250 and GFX13 targets, expanding support for AMD's upcoming ...

#Hardware
2026-02-06 LocalLLaMA

llama.cpp integrates Kimi-Linear support: improved performance

The llama.cpp library has integrated support for Kimi-Linear, a technique that promises to improve the performance of language models. The integration was made possible by a pull request on GitHub, opening new possibilities for efficient inference.

#Hardware #LLM On-Premise #DevOps
2026-02-06 Phoronix

Linux: Dynamic CPU Management for Cloud and High-Frequency Trading

A new patch series for Dynamic Housekeeping and Enhanced Isolation (DHEI) has been proposed for Linux. The goal is to enable dynamic re-partitioning of CPU resources without downtime, benefiting cloud-native orchestrators and high-frequency trading p...

#LLM On-Premise #DevOps
2026-02-06 LocalLLaMA

LLM at 10 tokens/s on an 8th Gen i3: It Can Be Done!

A user demonstrates how to run a 16 billion parameter LLM on a 2018 HP ProBook laptop with an 8th generation Intel i3 processor and 16GB of RAM. By optimizing the use of the iGPU and leveraging MoE models, surprising inference speeds are achieved, op...

#Hardware #LLM On-Premise #DevOps
2026-02-06 LocalLLaMA

LLM Inference: DeepSpeed Optimization and Performance

A user shares an image related to optimizing the inference of large language models (LLM) using DeepSpeed. The image suggests an analysis of performance and configurations to improve the speed and efficiency in running these models.

#Hardware
2026-02-06 ArXiv cs.LG

Denoising Diffusion Networks for Normative Modeling in Neuroimaging

A new study explores the use of denoising diffusion models to estimate reference distributions in neuroimaging, enabling the derivation of clinically interpretable deviation scores. The models, based on different architectures, were evaluated on synt...

2026-02-06 LocalLLaMA

Qwen3-235B: User Praises Local Performance

A user shared their positive experience with the Qwen3-235B language model, running it on a desktop system. The user highlighted the model's accuracy and utility, to the point of preferring it over a commercial ChatGPT subscription.

#LLM On-Premise #DevOps
2026-02-06 LocalLLaMA

Qwen3-Coder: improved performance on RTX 5090 with llama.cpp

A user reported a significant throughput increase, up to 26 tokens/second, using the Qwen3-Coder-Next-Q4_K_S model with llama.cpp on an RTX 5090. The optimization was achieved by offloading MoE expert tensors to the CPU and quantizing the KV cache.

#Hardware #LLM On-Premise
2026-02-06 LocalLLaMA

Tensor Parallelism in Llama.cpp: A Promising Update

A pull request introduces tensor parallelism in Llama.cpp, paving the way for faster and more efficient inference on large language models. The community welcomes this development, which could significantly improve performance on distributed hardware...

#Hardware #LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

Gemma 4: Is Google still developing the language model?

The LocalLLaMA community is questioning the future of Gemma 4, wondering if Google is still investing in the development of the language model. Despite progress in the sector, the fate of Gemma 4 remains uncertain.

#LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

SoproTTS v1.5: Zero-Shot Voice Cloning TTS for ~$100

SoproTTS v1.5 is a 135M parameter TTS (text-to-speech) model offering zero-shot voice cloning. Trained for approximately $100 on a single GPU, the model achieves around 20x real-time speed on a base MacBook M3 CPU. The new v1.5 version offers reduced...

#Hardware #LLM On-Premise #DevOps
2026-02-05 Ars Technica AI

OpenAI: GPT-5.3-Codex Extends Capabilities Beyond Just Writing Code

OpenAI has announced GPT-5.3-Codex, a new version of its advanced coding model, accessible via command line, IDE extension, web interface, and a new macOS desktop app. This model outperforms previous versions in benchmarks like SWE-Bench Pro and Term...

#LLM On-Premise #DevOps
2026-02-05 OpenAI Blog

GPT-5 lowers the cost of cell-free protein synthesis

An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation. This automated approach promises to accelerate biological research and reduce developm...

#LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

New OCR Models: LightOnOCR-2 and GLM-OCR Improve Accuracy

LightOnOCR-2 and GLM-OCR, two new models for optical character recognition (OCR), have been released. A user reported superior performance compared to solutions available in late 2025, with GLM-OCR offering speed and reliable structured output.

2026-02-05 Phoronix

Intel Battlemage GPUs: D3cold Support Re-enabled with Linux 7.0 (Partially)

Intel's Xe graphics driver for Linux, starting with kernel 7.0, will re-enable D3cold support for Battlemage GPUs. This feature was disabled due to instability issues in power state transitions. The change will not apply to all systems, excluding spe...

#Hardware #LLM On-Premise #DevOps
2026-02-05 OpenAI Blog

GPT-5.3-Codex: New Model for Code Generation

GPT-5.3-Codex has been unveiled, an advanced model for code generation that combines the performance of GPT-5.2-Codex with superior reasoning and professional knowledge capabilities. The model positions itself as one of the most advanced of its kind.

#LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

DeepBrainz-R1: Small Models for Agentic Workflows Released

DeepBrainz has released DeepBrainz-R1, a family of small language models (4B, 2B, 0.6B) focused on reasoning for agentic workflows. Optimized for multi-step reasoning and stability in tool-calling, these Apache 2.0 models aim to provide predictable b...

#LLM On-Premise #DevOps
2026-02-05 Phoronix

Debian Restricts CI Data Access Due to LLM Scrapers / Bot Traffic

Debian's continuous integration (CI) infrastructure has restricted public access to its data due to excessive scraping by bots used to train large language models (LLMs). The load generated by these scrapers has impacted web server resources.

#LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

gWorld: 8B model beats 402B Llama 4 by generating web code

Trillion Labs and KAIST AI introduced gWorld, an open-weight visual world model for mobile GUIs. gWorld, available in 8B and 32B versions, generates executable web code instead of pixels, surpassing larger models like Llama 4 in accuracy. This approa...

#LLM On-Premise #Fine-Tuning #DevOps
2026-02-05 LocalLLaMA

Strix Halo benchmarks: 13 LLM models, 15 llama.cpp builds

A Reddit user benchmarked the Strix Halo's iGPU, testing various software configurations with 13 LLM models and 15 different llama.cpp builds. The aim was to evaluate the impact of ROCm, Vulkan, and various compilation options on inference performanc...

#Hardware #LLM On-Premise #DevOps
2026-02-05 Tom's Hardware

Nvidia DLSS 4.5: Ray Reconstruction without Denoisers?

Nvidia is reportedly developing DLSS 4.5, an advanced version of its upscaling technology that could eliminate the need for denoisers in ray tracing. This is thanks to a Transformer model that reconstructs ray-traced reflections more accurately.

#Hardware
2026-02-05 Phoronix

Intel Arc B390 Graphics Performance On Linux With Panther Lake

First Linux benchmarks of the Intel Arc B390 GPU, integrated in high-end Panther Lake models. The Xe3 graphics card, equipped with 12 Xe cores, promises interesting performance in desktop and mobile environments for graphics and compute workloads.

#Hardware #LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

vLLM-Omni: any-to-any multimodal inference with improved efficiency

The vLLM team introduced vLLM-Omni, a system designed for any-to-any multimodal models handling text, images, video, and audio. The architecture includes stage-based graph decomposition, per-stage batching, and flexible GPU allocation, achieving up t...

#Hardware #LLM On-Premise
2026-02-05 Phoronix

Krita 6.0 Beta Released: Qt6 & Wayland Color Management

The first beta release of Krita 6.0 is now available, a featureful digital painting program, re-based against the Qt6 toolkit. Krita 5.3 Beta is also being released for those sticking to Qt5. The update introduces improvements in color management and...

#LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

AnyTTS: Universal Text-to-Speech for AI Chat Systems

A developer created AnyTTS, a system that allows using any text-to-speech (TTS) engine with various AI chat interfaces, including ChatGPT and local LLM models. The integration happens via the clipboard, simplifying TTS usage across platforms. Current...

#LLM On-Premise #DevOps
2026-02-05 Tech.eu

Qontext Closes $2.7M Pre-Seed Round to Develop Context Layer for AI

Berlin-based Qontext, developing an independent context layer for AI, has secured $2.7 million in pre-seed funding. The company plans to expand its platform and team to develop reusable context infrastructure, enabling AI processes to operate on reli...

2026-02-05 Microsoft Research

Microsoft Paza: ASR benchmarks and models for low-resource languages

Microsoft introduces Paza, a project to improve automatic speech recognition (ASR) in low-resource languages. It includes PazaBench, an ASR leaderboard for 39 African languages, and Paza ASR models, optimized for six Kenyan languages. The initiative,...

#Fine-Tuning
2026-02-05 Phoronix

Linux 7.0: Improved Nouveau Support for Better NVK Performance

The Linux 6.19 merge window introduced support for larger pages and compression with the Nouveau kernel driver, aiming to improve the performance of open-source NVIDIA drivers. Initial issues disabled this functionality, but version 7.0 should resolv...

#Hardware #LLM On-Premise #DevOps
2026-02-05 ArXiv cs.CL

NLP for Automated Classification of CS Curriculum Materials

A new study explores the use of Natural Language Processing (NLP), including Large Language Models (LLM), to automatically classify pedagogical materials against computer science curriculum guidelines. The goal is to accelerate and simplify the proce...

#RAG
2026-02-05 ArXiv cs.LG

Reversible Deep Learning for 13C NMR in Chemoinformatics

A novel reversible deep learning model employs a conditional invertible neural network to link molecular structures and 13C NMR spectra. The network, built upon i-RevNet bijective blocks, enables spectrum prediction from structure and, conversely, th...

2026-02-05 ArXiv cs.AI

LLMs: Enhanced Reasoning for Mathematical Problem Solving

A new method, Iteratively Improved Program Construction (IIPC), enhances the mathematical reasoning capabilities of large language models (LLMs). IIPC iteratively refines programmatic reasoning chains, combining execution feedback with the Chain-of-t...

2026-02-05 ArXiv cs.AI

Knowledge Model Prompting Increases LLM Performance on Planning Tasks

A new study explores the effectiveness of the Task-Method-Knowledge (TMK) framework to enhance reasoning and planning capabilities of Large Language Models (LLMs). Results show that TMK-structured prompting can significantly increase accuracy on comp...

#LLM On-Premise #DevOps
2026-02-05 LocalLLaMA

Incomplete SOTA Models: The Disappointment of Tencent's Youtu-VL-4B

A user expressed frustration with Tencent's Youtu-VL-4B model, advertised as a state-of-the-art (SOTA) solution for various computer vision tasks. Despite the promises, the released code was found to be incomplete, with key features missing and hidde...

#DevOps
2026-02-05 LocalLLaMA

Codag: Visualize LLM Workflows in VSCode

A developer has created Codag, an open-source VSCode extension that visualizes LLM workflows directly within the development environment. It supports several frameworks such as OpenAI, Anthropic, Gemini, LangChain, LangGraph, and CrewAI, along with v...

2026-02-04 LocalLLaMA

Claude-Code: backend replaced with NVIDIA NIM for LLM inference

A user replaced Claude-Code's backend with NVIDIA NIM models, leveraging a free API for LLM inference. The modification includes using Telegram as an interface and preserves reasoning tokens between tool calls, enhancing performance with models like ...

#Hardware #LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

Kimi K2.5: New Open-Weight Model Record on ECI

Kimi K2.5 sets a new record among open-weight models on the Epoch Capabilities Index (ECI), which combines multiple benchmarks onto a single scale. Its score of 147 is on par with models like o3, Grok 4, and Sonnet 4.5, while still lagging behind the...

#LLM On-Premise #DevOps
2026-02-04 Phoronix

Microsoft Develops LiteBox: A Rust-Based Sandboxing Library OS

Microsoft has announced LiteBox, a sandboxing operating system developed in Rust. Designed for security, LiteBox leverages Linux Virtualization Based Security (LVBS) to isolate the guest kernel through hardware virtualization, offering a protected en...

#Hardware #LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

Qwen3-Coder-Next-FP8: A New King for Code Generation?

A Reddit user reported excellent performance of the Qwen3-Coder-Next-FP8 model. The discussion focuses on its code generation capabilities, suggesting a potential improvement over existing alternatives. The original article includes a link to an imag...

#Fine-Tuning
2026-02-04 Wired AI

Mistral AI's Ultra-Fast Translation Challenges Big AI Labs

French startup Mistral AI is taking a different approach compared to large US labs, focusing on efficiency and translation speed of its models, with a focus on hardware resource optimization.

#Hardware #LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

Vectorized fix for Qwen3Next in llama.cpp

A pull request on llama.cpp introduces a fix for the `key_gdiff` vectorized calculation in the Qwen3Next model. The change, initially reported on Reddit, aims to improve the model's accuracy and efficiency within the llama.cpp project.

#LLM On-Premise #DevOps
2026-02-04 Tom's Hardware

Bill Gates and software 'piracy': a 50-year-old open letter

In 1976, Bill Gates expressed concern about the unauthorized copying of Altair BASIC software by hobbyists. An open letter reveals the early challenges related to protecting intellectual property in the software world.

2026-02-04 Phoronix

Intel Driver Disables Vulkan Video Encode On Newer Hardware

Intel's ANV open-source Vulkan driver has temporarily disabled Vulkan Video encode support on newer graphics hardware. The decision was made due to insufficient testing, despite Vulkan Video's growing traction as a cross-vendor, cross-platform API fo...

#Hardware #LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

Ollama under fire: a heated debate in the LocalLLaMA community

A recent thread on Reddit, within the LocalLLaMA community, has sparked a heated debate about the criticisms of Ollama, a framework for local execution of large language models (LLMs). The discussion focuses on alleged shortcomings and areas for impr...

#LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

Intern-S1-Pro: A New Large Language Model

Intern-S1-Pro, a large language model (LLM) with approximately 1 trillion parameters, has been released. It appears to be a scaled version of the Qwen3-235B model, with an architecture based on 512 experts.

#Hardware #LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

Qwen3-Coder-Next REAP: New 48B GGUF Model Released

A new 48 billion parameter Qwen3-Coder-Next REAP model has been released in GGUF format. This format facilitates the use of the model on various hardware platforms, making it accessible to a wide range of developers and researchers interested in expe...

#Hardware #LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

GPT-4o and context: the challenge of long conversations

A user on r/LocalLLaMA reports "context rot" issues with GPT-4o in long conversations (over 15 turns) in a support agent. Sliding window and summarization strategies do not solve the problem. Context management remains an open challenge in the develo...

#LLM On-Premise #DevOps
2026-02-04 LocalLLaMA

Qwen3-Coder-Next: NVFP4 Quantization Released (45GB)

A quantized version of Qwen3-Coder-Next in NVFP4 format is now available, weighing 45GB. The model was calibrated using the ultrachat_200k dataset, with a 1.63% accuracy loss in the MMLU Pro+ benchmark.

#Hardware #LLM On-Premise #Fine-Tuning
2026-02-04 ArXiv cs.CL

STEMVerse: A Framework for Evaluating STEM Reasoning in LLMs

A new study introduces STEMVerse, a diagnostic framework to analyze the science, technology, engineering, and mathematics (STEM) reasoning capabilities of large language models (LLMs). STEMVerse aims to overcome the limitations of current benchmarks,...

#LLM On-Premise #DevOps
2026-02-04 ArXiv cs.LG

LLMs to Augment Parameter-Efficient Fine-tuned Cybersecurity Models

A new study explores the use of large language models (LLMs) to enhance cybersecurity models. Strategies include using LLMs for data labeling and as fallback mechanisms for low-confidence predictions, combining parameter-efficient fine-tuning and pre...

#LLM On-Premise #Fine-Tuning #DevOps
2026-02-04 ArXiv cs.LG

UNSO: Unified Newton-Schulz Orthogonalization for Stable Performance

A novel approach, called UNSO (Unified Newton-Schulz Orthogonalization), aims to address efficiency and stability issues in the Newton-Schulz iteration, used in optimizers like Muon and on the Stiefel manifold. The method consolidates the iterative s...

2026-02-04 LocalLLaMA

Qwen-Coder-Next running on ROCm on Strix Halo: local testing

A user reported successfully running the Qwen-Coder-Next model on a Strix Halo platform using ROCm. The test was performed with llamacpp-rocm and a context size of 16k, opening new possibilities for running large language models locally.

#Hardware #LLM On-Premise #DevOps
2026-02-03 LocalLLaMA

ACE-Step-1.5: Open-Source Audio Generative Model Released

ACE-Step-1.5, an MIT-licensed open-source audio generative model, has been released. Its performance is close to commercial platforms like Suno. The model supports LoRAs and offers cover and repainting features. Hugging Face demos and ComfyUI integra...

#LLM On-Premise #Fine-Tuning #DevOps
2026-02-03 Ars Technica AI

Xcode 26.3 adds support for Claude, Codex via Model Context Protocol

Apple has announced Xcode 26.3, a new version of its IDE that supports agentic coding tools like Codex and Claude Agent. The integration is enabled via Model Context Protocol (MCP), allowing AI agents to interact with external tools and structured re...

#LLM On-Premise #DevOps
2026-02-03 LocalLLaMA

ACE-Step 1.5: The Open-Source Model Challenging Suno in Music Generation

ACE-Step 1.5, an open-source model for music generation, is now available. It promises to outperform Suno in quality, generating full songs in about 2 seconds on an A100 GPU and running locally on PCs with 4GB of VRAM. The code, weights, and training...

#Hardware #LLM On-Premise #Fine-Tuning
2026-02-03 LocalLLaMA

Qwen3-Coder-Next: New language model for programming

Qwen3-Coder-Next is available, a new language model developed for programming applications. The model is accessible via Hugging Face and related discussion is active on Reddit. This release represents a significant update in the field of language mod...

2026-02-03 LocalLLaMA

Qwen3-Coder-Next: new language model for programming

Qwen3-Coder-Next, a language model developed for programming applications, has been released on Hugging Face. Its availability on the platform facilitates access and integration by developers. The model promises to improve efficiency in software deve...

#LLM On-Premise #DevOps
2026-02-03 Phoronix

OpenIndiana Is Porting Solaris' IPS Package Management To Rust

OpenIndiana, the open-source project built atop Illumos that is continuing to maintain and advance the former OpenSolaris code, is working on modernizing the Image Packaging System (IPS) package management solution. As part of that, they are working ...

#LLM On-Premise #DevOps
2026-02-03 Phoronix

Reworked NTFS Linux Driver Posted With More Improvements & Fixes

A new version of the NTFS driver for Linux is available, based on the original code and aimed at delivering superior performance and new features. The goal is to provide a more efficient alternative for those who rely on this Microsoft file system.

#LLM On-Premise #DevOps
2026-02-03 LocalLLaMA

GLM releases open-source OCR model

GLM has released an open-source Optical Character Recognition (OCR) model. The model, named GLM-OCR, is available on Hugging Face. It appears to be composed of a 0.9 billion parameter vision model and a 0.5 billion parameter language model, suggestin...

#LLM On-Premise #DevOps
2026-02-03 LocalLLaMA

Qwen3-TTS Studio: Voice Cloning and Local Podcast Generation

A developer has built Qwen3-TTS Studio, an interface for voice cloning and automated podcast generation. The system supports 10 languages, runs voice synthesis locally, and can be integrated with local LLMs for script generation.

#LLM On-Premise #DevOps
2026-02-03 ArXiv cs.CL

MediGRAF: Hybrid Clinical AI for Safe Health Data Analysis

A new hybrid system, MediGRAF, combines knowledge graphs and LLMs to query patient health data. The system integrates structured and unstructured data, achieving 100% accuracy in factual answers and a high level of quality in complex inferences, with...

#Fine-Tuning #RAG
2026-02-03 ArXiv cs.AI

FastAPI and Triton Inference Server Benchmarking on Kubernetes

A new study compares FastAPI and NVIDIA Triton Inference Server for deploying machine learning models in healthcare, evaluating latency and throughput on Kubernetes. The analysis highlights the benefits of a hybrid approach to balance performance and...

#Hardware #LLM On-Premise #DevOps
2026-02-02 Phoronix

Firefox 148 Ready With New Settings For AI Controls

The upcoming Firefox 148 release will include a new AI controls area within the browser's settings. This follows concerns raised over comments by Mozilla's new CEO about evolving Firefox into a "modern AI browser".

#LLM On-Premise #DevOps
2026-02-02 TechCrunch AI

Firefox: Granular Control Over Generative AI Coming Soon

Firefox will introduce new settings in version 148 to control the generative AI features integrated into the browser. Users will be able to completely block these features, offering greater control over their browsing experience.

#LLM On-Premise #DevOps
2026-02-02 Ars Technica AI

OpenAI launches Codex desktop app for macOS, challenging Claude Code

OpenAI has released a macOS desktop app for Codex, its large language model (LLM)-based coding tool. This move aims to compete with Anthropic's Claude Code, offering an alternative to command-line interfaces (CLI) and IDE extensions.

#LLM On-Premise #DevOps
2026-02-02 OpenAI Blog

Codex: Centralized AI Development Environment for macOS

Codex is a new macOS application that acts as a command center for AI and software development. It allows managing multiple agents, parallel workflows, and long-running tasks, all within a single interface.

2026-02-02 Tom's Hardware

Ryzen 7 9850X3D: Factory Overclock of the 9800X3D?

Binning data from 13 Ryzen 7 9850X3D samples suggests the CPU is essentially a 9800X3D with higher voltages to achieve higher clock speeds. The single-core performance of the 9850X3D appears to primarily stem from this factory overclock.

#LLM On-Premise #DevOps
2026-02-02 DigiTimes

SMIC reportedly sets up advanced packaging research institute in Shanghai

Chinese semiconductor manufacturer SMIC has reportedly established a research institute in Shanghai focusing on the development of advanced packaging technologies. This strategic move aims to enhance production capabilities and innovation in the semi...

#Hardware #LLM On-Premise #DevOps
2026-02-02 Tech.eu

Incard closes £10M Series A to expand its financial platform

Incard, a financial platform for high-growth digital companies, has raised £10 million in Series A funding. The company plans to expand into new markets, enhance its product offering, and invest in automation, AI-driven financial workflows, and team ...

#LLM On-Premise #DevOps
2026-02-02 ArXiv cs.CL

MrRoPE: A Unified Approach to Extend LLM Context Window

A new study introduces MrRoPE, a generalized formulation for extending the context window of large language models (LLMs) based on a radix system conversion perspective. This approach unifies various existing strategies and introduces two training-fr...

#LLM On-Premise #Fine-Tuning #DevOps
2026-02-02 ArXiv cs.LG

Emotion Recognition: Domain Knowledge Outperforms Transformers

A study on the EAV dataset reveals that, for multimodal emotion recognition on small datasets, complex attention mechanisms (Transformers) underperform compared to modifications based on domain knowledge. Adding delta MFCCs to the audio CNN improves ...

#LLM On-Premise #Fine-Tuning #DevOps
2026-02-02 LocalLLaMA

Step-3.5-Flash: outperforms with fewer parameters

The Step-3.5-Flash model, with a reduced active parameter architecture (11B out of 196B total), demonstrates superior performance compared to DeepSeek v3.2 in coding and agent benchmarks. DeepSeek v3.2 uses an architecture with many more active param...

#Hardware #LLM On-Premise #DevOps
2026-02-01 Phoronix

Linux 6.19: Stable Release Delayed Due to Holidays

The stable release of the Linux 6.19 kernel has been delayed by a week due to the year-end holiday period. Version 6.19-rc8 is now available, with the stable version expected next week. This delay is not due to critical bugs, but rather the need to m...

2026-02-01 LocalLLaMA

AIDA: Pentesting platform with AI control and 400+ tools

A developer has created AIDA, an open-source pentesting platform that allows an AI agent to control over 400 security tools. The AI can execute tools, chain attacks, and document findings, all through a Docker container and a web dashboard.

#LLM On-Premise #DevOps
2026-02-01 Phoronix

GNOME Resources 1.10 Adds Monitoring Support For AMD Ryzen AI NPUs

GNOME Resources 1.10, the newest version of this system monitoring app, introduces monitoring support for AMD Ryzen AI NPUs. This application is now used by default on distributions like the upcoming Ubuntu 26.04 LTS. The update also includes other u...

#Hardware
2026-02-01 LocalLLaMA

OLMO 3.5: Hybrid Model for Efficient LLM Inference Coming Soon

AI2's OLMO 3.5 model combines standard transformer attention with linear attention using Gated Deltanet. This hybrid approach aims to improve efficiency and reduce memory usage while maintaining model quality. The OLMO series is fully open source, fr...

#Fine-Tuning
2026-02-01 LocalLLaMA

Falcon-H1-Tiny: Specialized Micro-Models at 90M Parameters

TII releases Falcon-H1-Tiny, a series of sub-100M parameter models challenging the scaling dogma. These specialized models exhibit a lower tendency to hallucinate compared to larger, general-purpose models. Specialized variants offer competitive perf...

#Hardware #LLM On-Premise #Fine-Tuning
2026-02-01 LocalLLaMA

Uncensored LLM Models Available on Hugging Face

An overview of uncensored large language models (LLM) available on the Hugging Face platform. The list includes variants of GLM, GPT OSS, Gemma, and Qwen, with different methods of removing restrictions. The article provides direct links to the model...

#LLM On-Premise #DevOps
2026-02-01 Phoronix

Phoronix: Linux Kernel, ReactOS Developments & AMD Ryzen 7 Topped January

A recap of the most popular news and reviews published on Phoronix in January. The focus is on Linux kernel developments, progress in the ReactOS operating system, and analysis of the AMD Ryzen 7 9850X3D CPU. Phoronix published nearly 300 articles du...

#Hardware #LLM On-Premise #DevOps
2026-02-01 LocalLLaMA

vLLM-MLX on Apple Silicio: Up to 87% Higher Throughput

Recent research compares the performance of vLLM-MLX on Apple Silicio with llama.cpp, highlighting significantly higher throughput. The results suggest potential advantages in using Apple hardware for local inference of large language models (LLMs).

#LLM On-Premise #DevOps
2026-02-01 LocalLLaMA

Kanade Tokenizer: real-time voice cloning on CPU

A developer has presented Kanade Tokenizer, a voice cloning tool optimized for speed, with a real-time factor exceeding RVC. It also runs on CPU. A fork with a GUI based on Gradio and Tkinter is available.

#LLM On-Premise #DevOps
2026-02-01 LocalLLaMA

Can 4chan data REALLY improve a model? Turns out it can!

An experiment showed how training a language model on a dataset derived from 4chan led to unexpected results. The model, Assistant_Pepe_8B, outperformed NVIDIA's Nemotron base model, despite being trained on data considered to be of lower quality. Th...

#Hardware #LLM On-Premise #Fine-Tuning
2026-02-01 DigiTimes

LMOC ramps silicio photonics output with new MOCVD expansion plan

LandMark Optoelectronics is expanding its silicio photonics component production by adding new MOCVD reactors. The expansion aims to meet the growing demand for high-speed interconnects in data centers and artificial intelligence applications.

#LLM On-Premise #DevOps
2026-02-01 LocalLLaMA

NanoChat: Beating GPT-2 for Under $100

Andrej Karpathy demonstrated how to surpass GPT-2's performance with a model called NanoChat, trained in just three hours on 8 H100 GPUs. The project includes details on the architecture, optimizers used, data setup, and a script for reproducing the ...

#Hardware #LLM On-Premise #DevOps
2026-02-01 Phoronix

Linux 7.0: Per-CPU Caching with Sheaves for Improved Performance

Linux 7.0 is preparing to introduce significant improvements in per-CPU cache management, thanks to the integration of 'sheaves'. This technology, already present in optional form since version 6.18, aims to progressively replace traditional CPU slab...

2026-01-31 LocalLLaMA

g-HOOT: A New Research Paper in the World of AI

A new research paper, available on arXiv, called "g-HOOT in the Machine", has caught the attention of the LocalLLaMA community. The paper, identified via the provided arXiv link, promises to explore new frontiers in the field of artificial intelligen...

← Back to All Topics