Open Source AI Models and Tools

2026-03-22 • LocalLLaMA

Qwen3.5-122B-A10B: Uncensored Release and K_P Quantization

An uncensored version of Qwen3.5-122B-A10B is now available, designed to avoid refusals in generations. It introduces new K_P quantizations, offering improved quality with a small increase in file size. Several quantizations and vision support are in...

#LLM On-Premise #DevOps

2026-03-20 • LocalLLaMA

Nvidia Nemotron Cascade 2 30B: Promising Open-Source Language Model

Nvidia has released Nemotron Cascade 2 30B A3B, a language model based on Nemotron 3 Nano Base. Preliminary results indicate competitive performance with 120B models in math and code tasks. The model is available on Hugging Face and documented in a r...

#Hardware #Fine-Tuning

2026-03-19 • LocalLLaMA

Devstral Small 2: 24B LLM Severely Underrated for Code Assistance

A user with a 16GB GeForce RTX 4060 Ti GPU tested several large language models (LLMs) for code assistance, focusing on understanding and extending existing reinforcement learning code. Devstral Small 2 (24B) proved to be the most effective in interp...

#Hardware #LLM On-Premise #DevOps

2026-03-19 • Phoronix

AMD Preps More GFX12.1 Enablement For Linux 7.1

AMD has sent out a new batch of AMDGPU kernel graphics driver and AMDKFD kernel compute driver changes to DRM-Next ahead of next month's Linux 7.1 merge window. The updates include enablement for GFX12.1 as well as initial VCN 5.0.2 & JPEG 5.0.2 IP.

#Hardware #LLM On-Premise #DevOps

2026-03-19 • Ars Technica AI

OpenAI Acquires Astral, Open Source Python Tool Maker

OpenAI announced the acquisition of Astral, known for open source Python development tools like uv and Ruff. The integration into the Codex team aims to enhance AI capabilities across the software development lifecycle, enabling AI agents to interact...

#LLM On-Premise #DevOps

2026-03-19 • Phoronix

GNUnet 0.27 Released: Decentralized P2P Networking Framework

Version 0.27 of GNUnet is now available. This free software framework is designed for constructing decentralized, peer-to-peer networks. The new release includes several updates, but developers caution that its use may require some tolerance for diff...

#LLM On-Premise #DevOps

2026-03-19 • LocalLLaMA

ACE-Step 1.5: Music Generation with C++17 and GGML

C++17 implementation of ACE-Step 1.5 for music generation, based on GGML. The code is designed to run on various platforms, including CPU, CUDA, ROCm, Metal, and Vulkan, offering deployment flexibility for different environments.

#Hardware #LLM On-Premise #DevOps

2026-03-19 • Phoronix

Mozilla Releases Llamafile 0.10 To Enhance Their AI Offering For Easy-To-Use LLMs

Mozilla has released Llamafile 0.10, an update signaling continued activity in the AI sector. This release comes after a period of uncertainty about the project's future, similar to what happened with DeepSpeech. The goal is to make LLM models easier...

#LLM On-Premise #DevOps

2026-03-19 • AI News

NVIDIA: Open-source toolkit for safer enterprise AI agents

NVIDIA has introduced an open-source toolkit to simplify the development and deployment of autonomous AI agents in the enterprise. The goal is to provide companies with the tools to control data and liability when using these agents, with a focus on ...

#Hardware #LLM On-Premise #DevOps

2026-03-19 • LocalLLaMA

KoboldCpp: voice cloning and native music generation

KoboldCpp celebrates its third anniversary with the release of version 1.110, introducing new features including voice cloning via Qwen3 TTS and native Ace Step 1.5 support for music generation. The update is available on GitHub.

#LLM On-Premise #DevOps

2026-03-18 • LangChain Blog

Polly by LangSmith: The AI Assistant for Model Debugging

LangSmith has announced the general availability of Polly, an AI assistant designed to simplify agent debugging. Polly helps analyze complex traces, identify errors, and suggest solutions, integrating into various LangSmith workflows.

#Fine-Tuning

2026-03-18 • The Register AI

Systemd 260 kills SysV, tells AI not to misbehave

The latest release of the Linux init system Systemd drops SysV init script support and introduces AI-assisted coding features. The release promises to stir up further reactions in the Linux world.

#LLM On-Premise #DevOps

2026-03-18 • LocalLLaMA

Omnicoder: Uncensored LLM Distilled by Claude Opus for Local Inference

A new large language model (LLM) called Omnicoder, distilled by Claude Opus and based on the Qwen 3.5 9B architecture, is now available. This model, created through a merge process, stands out for its lack of censorship and its suitability for local ...

#LLM On-Premise #Fine-Tuning #DevOps

2026-03-18 • DigiTimes

Tencent joins OpenClaw as sponsor after copy dispute, aligns with OpenAI and Baidu

Tencent has joined OpenClaw as a sponsor following a code copying dispute. This strategic move indicates a rapprochement of the Chinese technology giant with OpenAI and Baidu in the field of artificial intelligence.

#LLM On-Premise #DevOps

2026-03-17 • LocalLLaMA

New open-source LLM releases: Skyfall, Valkyrie, and Anubis

Four new open-source language models developed by TheLocalDrummer have been quietly released: Skyfall 31B v4.1, Valkyrie 49B v2.1, Anubis 70B v1.2, and Anubis Mini 8B v1 (based on Llama 3.3). These models represent significant upgrades over previous ...

#LLM On-Premise #DevOps

2026-03-17 • LocalLLaMA

Unsloth Studio: New open-source web UI to train and run LLMs

Unsloth Studio is a new open-source web UI that allows training and running large language models (LLMs) locally. It supports various operating systems, model formats, and offers tools for model optimization and comparison.

#LLM On-Premise #Fine-Tuning #DevOps

2026-03-17 • LocalLLaMA

Unsloth Studio: A competitor to LM Studio for local LLMs?

Unsloth announced Unsloth Studio, an Apache-licensed runner compatible with Llama.cpp. This could be a game changer for LLM users operating locally, offering an alternative to LM Studio in the GGUF ecosystem.

#LLM On-Premise #DevOps

2026-03-17 • Tom's Hardware

Nvidia updates data center roadmap with Rosa CPU and stacked Feynman GPUs — optical NVLink, Groq LPUs with NVFP4, and NVLink also on deck

Nvidia updates its data center roadmap, introducing the Rosa CPU and Feynman GPUs. The company is focusing on optical NVLink and Groq LPUs with NVFP4, as well as new versions of NVLink. These innovations promise to significantly improve the performan...

#Hardware #LLM On-Premise #DevOps

2026-03-17 • Phoronix

Microsoft, OpenAI & Others Pony Up $12.5M To Strengthen Open-Source Security

The Linux Foundation announced $12.5 million USD in grants from companies like OpenAI, Microsoft, and Google. The investment aims to strengthen the security of the open-source software ecosystem, crucial for modern digital innovation and infrastructu...

#LLM On-Premise #DevOps

2026-03-17 • Phoronix

Intel Compute Runtime: OpenCL and Level Zero Optimizations

Intel Compute Runtime version 26.09.37435.1 is now available, an open-source stack for OpenCL and Level Zero. This release introduces performance improvements and new features for Intel graphics hardware on Windows and Linux systems.

#Hardware #LLM On-Premise #DevOps

2026-03-16 • The Register AI

Nvidia presents NemoClaw based on OpenClaw for security

Nvidia has announced NemoClaw, a system based on OpenClaw, described by the CEO as the operating system for personal AI. The announcement underscores the growing importance of security and control in AI, pushing towards solutions that offer greater p...

#Hardware #LLM On-Premise #DevOps

2026-03-16 • Phoronix

Imagination's Open-Source PowerVR Vulkan Driver Now Plays Nicely With Zink OpenGL

Imagination Tech has been investing in an upstream and open-source DRM kernel graphics driver as well as a PowerVR Vulkan driver in Mesa. Their Mesa focus has exclusively been on the PowerVR Vulkan driver with the plans all along to use the Zink gene...

#Hardware

2026-03-16 • The Register AI

Free Software Foundation calls for free-range LLMs rather than factory-farmed AI

The Free Software Foundation (FSF) expresses concerns about the use of proprietary materials in the training of AI models, advocating for a more open and decentralized approach to artificial intelligence development. The organization criticizes centr...

#LLM On-Premise #DevOps

2026-02-20 • LocalLLaMA

PaddleOCR-VL now in llama.cpp

The open-source multilingual model PaddleOCR-VL has been integrated into llama.cpp. This integration allows running model inference directly on local hardware, opening new possibilities for OCR applications with privacy and data sovereignty requireme...

#LLM On-Premise #DevOps

2026-02-20 • Phoronix

Vulkan 1.4.344 Released With New Extension From Valve

Vulkan 1.4.344 is out today as the latest routine spec update for this high performance graphics and compute API. Besides a handful of fixes and clarifications, Vulkan 1.4.344 brings a new extension courtesy of Valve engineers.

#Hardware #LLM On-Premise #DevOps

2026-02-19 • LocalLLaMA

Taalas Demonstrates Llama 3.1 8B Inference at 16,000 tok/s on ASIC

Startup Taalas has released a free chatbot demo and API endpoint powered by a proprietary ASIC chip. The goal is to demonstrate high-speed inference of LLM models, achieving 16,000 tokens per second with Llama 3.1 8B. The company is now moving on to ...

#Hardware #LLM On-Premise #DevOps

2026-02-19 • LocalLLaMA

Llama.cpp: IQ_K and IQ_KS quantization support

A pull request to llama.cpp introduces support for IQ*_K and IQ*_KS quantization schemes, derived from the ik_llama.cpp project. This implementation could lead to more compact and efficient models, particularly relevant for inference on resource-cons...

#LLM On-Premise #DevOps

2026-02-19 • Phoronix

AMD Announces hipThreads For Easier Porting Of C++ Code To GPUs

AMD announced hipThreads, a C++ style concurrency library for AMD GPUs, designed to simplify the porting of C++ code. This new addition to the ROCm/HIP ecosystem aims to make the development of high-performance applications on AMD GPUs easier.

#Hardware #LLM On-Premise #DevOps

2026-02-19 • LocalLLaMA

TextWeb: Render web pages as text grids for AI agents

TextWeb is an open-source project that transforms web pages into small text grids (2-5KB), ideal for processing by AI agents. Instead of 1MB screenshots, TextWeb uses MCP, LangChain, and CrewAI for a more efficient representation of information.

#LLM On-Premise #DevOps

2026-02-18 • LocalLLaMA

Lemonade: Ollama API benefits without Ollama?

Lemonade Server allows leveraging Ollama API functionalities without directly using Ollama. The integration simplifies model management and interaction with Open WebUI, offering an alternative for those seeking flexibility in using GGUF and NPU model...

#Hardware #LLM On-Premise #DevOps

2026-02-18 • TechCrunch AI

Sarvam AI bets on open-source with new language models

Indian AI lab Sarvam AI has unveiled a new lineup of models, including language models with 30 and 105 billion parameters, a text-to-speech model, a speech-to-text model, and a vision model for document parsing. A major bet on open-source AI.

#LLM On-Premise #DevOps

2026-02-17 • LocalLLaMA

Cohere Releases Tiny Aya: A 3.35B Parameter Multilingual Model

Cohere Labs has released Tiny Aya, an open-weight, pre-trained small language model (3.35 billion parameters) optimized for efficient multilingual representation across 70+ languages, including lower-resource ones. The model is designed to support ad...

#Fine-Tuning #DevOps

2026-02-15 • TechCrunch AI

Hollywood unhappy with Seedance 2.0 video generator over copyright concerns

Hollywood organizations are pushing back against Seedance 2.0, a new AI-powered video generation model. The main accusation is that the tool facilitates widespread copyright infringement.

2026-02-15 • LocalLLaMA

Open-weight models dominate OpenRouter leaderboard

For the first time, the top four models on the OpenRouter leaderboard are all open-weight. This marks a potential turning point for the adoption and trust in open-source language models, offering viable alternatives to proprietary models.

#LLM On-Premise #DevOps

2026-02-15 • LocalLLaMA

JoyAI-LLM-Flash: new open source LLM model on Hugging Face

The JoyAI-LLM-Flash open source large language model (LLM) is available on Hugging Face. The LocalLLaMA community on Reddit has shared links and images related to the model, paving the way for discussions and potential local uses. The model is develo...

#LLM On-Premise #DevOps

2026-02-14 • LocalLLaMA

Qwen3-TTS.cpp: Optimized GGML Inference for Local Voice Cloning

Lightweight GGML implementation of Qwen3-TTS 0.6B, focused on fast inference and efficient memory usage. Optimization with Metal backend and CoreML code predictor promises a speedup of up to 4x compared to the PyTorch pipeline, with a memory footprin...

#Hardware #LLM On-Premise #DevOps

2026-02-14 • LocalLLaMA

Small LLM Evaluation: The Importance of Parsing in Local Agents

A benchmark of 21 small language models (LLMs) reveals that the ability to call tools locally depends as much on the model as on the accuracy of the parser used. The results highlight how models with less than 4 billion parameters can compete with la...

#Hardware #LLM On-Premise #DevOps

2026-02-14 • LocalLLaMA

Local Development with LLM Models: Tools and Experiences

An overview of tools for developing applications with large language models (LLMs) running locally, rather than in the cloud. Several frameworks and IDEs are presented that facilitate the integration of LLMs into development projects, with a focus on...

#LLM On-Premise #DevOps

2026-02-13 • LocalLLaMA

GPT-OSS 120B: Uncensored Open-Source Model for Local Inference

An uncensored version of GPT-OSS 120B is available, an open-source language model with 117 billion total parameters and a context window of 128K. The model is in MXFP4 format and can be run on consumer or server hardware equipped with high-capacity G...

#Hardware #LLM On-Premise #DevOps

2026-02-13 • LocalLLaMA

ByteDance Releases Protenix-v1 for Biomolecular Structure Prediction

ByteDance has released Protenix-v1, a new open-source model for biomolecular structure prediction. The model achieves AlphaFold3-level performance. The source code is available on GitHub, opening new possibilities for research and development in the ...

#LLM On-Premise

Open Source AI Models and Tools

Related Coverage