AI Model Development and Open Source

2026-02-04 • LocalLLaMA

Qwen3-Coder-Next: NVFP4 Quantization Released (45GB)

A quantized version of Qwen3-Coder-Next in NVFP4 format is now available, weighing 45GB. The model was calibrated using the ultrachat_200k dataset, with a 1.63% accuracy loss in the MMLU Pro+ benchmark.

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-03 • LocalLLaMA

ACE-Step-1.5: Open-Source Audio Generative Model Released

ACE-Step-1.5, an MIT-licensed open-source audio generative model, has been released. Its performance is close to commercial platforms like Suno. The model supports LoRAs and offers cover and repainting features. Hugging Face demos and ComfyUI integra...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-03 • LocalLLaMA

ACE-Step 1.5: The Open-Source Model Challenging Suno in Music Generation

ACE-Step 1.5, an open-source model for music generation, is now available. It promises to outperform Suno in quality, generating full songs in about 2 seconds on an A100 GPU and running locally on PCs with 4GB of VRAM. The code, weights, and training...

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-03 • LocalLLaMA

Qwen3-Coder-Next: new language model for programming

Qwen3-Coder-Next, a language model developed for programming applications, has been released on Hugging Face. Its availability on the platform facilitates access and integration by developers. The model promises to improve efficiency in software deve...

#LLM On-Premise #DevOps

2026-02-03 • LocalLLaMA

GLM-5: New language model coming in February

The arrival of GLM-5, a new language model, has been announced. The confirmation came via a post on X (formerly Twitter) by Jietang. Further details on the model's capabilities and specifications are expected with the official release.

#Hardware

2026-02-02 • ArXiv cs.CL

MrRoPE: A Unified Approach to Extend LLM Context Window

A new study introduces MrRoPE, a generalized formulation for extending the context window of large language models (LLMs) based on a radix system conversion perspective. This approach unifies various existing strategies and introduces two training-fr...

#LLM On-Premise #Fine-Tuning #DevOps

2026-02-02 • LocalLLaMA

Step-3.5-Flash: outperforms with fewer parameters

The Step-3.5-Flash model, with a reduced active parameter architecture (11B out of 196B total), demonstrates superior performance compared to DeepSeek v3.2 in coding and agent benchmarks. DeepSeek v3.2 uses an architecture with many more active param...

#Hardware #LLM On-Premise #DevOps

2026-02-01 • LocalLLaMA

OLMO 3.5: Hybrid Model for Efficient LLM Inference Coming Soon

AI2's OLMO 3.5 model combines standard transformer attention with linear attention using Gated Deltanet. This hybrid approach aims to improve efficiency and reduce memory usage while maintaining model quality. The OLMO series is fully open source, fr...

#Fine-Tuning

2026-02-01 • LocalLLaMA

Falcon-H1-Tiny: Specialized Micro-Models at 90M Parameters

TII releases Falcon-H1-Tiny, a series of sub-100M parameter models challenging the scaling dogma. These specialized models exhibit a lower tendency to hallucinate compared to larger, general-purpose models. Specialized variants offer competitive perf...

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-01 • LocalLLaMA

Uncensored LLM Models Available on Hugging Face

An overview of uncensored large language models (LLM) available on the Hugging Face platform. The list includes variants of GLM, GPT OSS, Gemma, and Qwen, with different methods of removing restrictions. The article provides direct links to the model...

#LLM On-Premise #DevOps

2026-02-01 • LocalLLaMA

Can 4chan data REALLY improve a model? Turns out it can!

An experiment showed how training a language model on a dataset derived from 4chan led to unexpected results. The model, Assistant_Pepe_8B, outperformed NVIDIA's Nemotron base model, despite being trained on data considered to be of lower quality. Th...

#Hardware #LLM On-Premise #Fine-Tuning

2026-02-01 • LocalLLaMA

NanoChat: Beating GPT-2 for Under $100

Andrej Karpathy demonstrated how to surpass GPT-2's performance with a model called NanoChat, trained in just three hours on 8 H100 GPUs. The project includes details on the architecture, optimizers used, data setup, and a script for reproducing the ...

#Hardware #LLM On-Premise #DevOps

2026-01-31 • LocalLLaMA

g-HOOT: A New Research Paper in the World of AI

A new research paper, available on arXiv, called "g-HOOT in the Machine", has caught the attention of the LocalLLaMA community. The paper, identified via the provided arXiv link, promises to explore new frontiers in the field of artificial intelligen...

2026-01-30 • LocalLLaMA

GPT-OSS: Why is this open-source model still so good?

A local LLM user questions the outstanding performance of GPT-OSS 120B, an older but still competitive open-source model. Despite newer architectures and models, GPT-OSS excels in speed, effectiveness, and tool calling. The article explores the reaso...

#LLM On-Premise #Fine-Tuning #DevOps

2026-01-30 • LocalLLaMA

Design Arena is now dominated by an open model

A Reddit post from the LocalLLaMA community speculates about a future (in 2026) where open-source models dominate the design field. The discussion focuses on the impact of this trend and its implications for the industry.

#LLM On-Premise #DevOps

2026-01-29 • LocalLLaMA

LingBot-World: Open Source Dynamic Simulation Outperforms Genie 3

The LingBot-World framework offers a high-capability world model that is fully open source, contrasting with proprietary systems like Genie 3. It surpasses Genie 3 in handling complex physics and scene transitions, maintaining 16 frames per second an...

2026-01-29 • LocalLLaMA

Distilled models: why aren't there more?

The emergence of "distilled" models like Qwen 8B DeepSeek R1 has demonstrated reasoning capabilities exceeding their size. The article questions why there aren't more models of this kind, capable of operating on hardware with limited resources.

#Hardware #LLM On-Premise #DevOps

2026-01-29 • LocalLLaMA

Mistral CEO Arthur Mensch: AI access like electricity

Mistral CEO Arthur Mensch compares AI access to electricity access, emphasizing the importance of uninterrupted and unthrottled access to this crucial resource. The statement highlights Mistral's vision of AI as a fundamental infrastructure.

#LLM On-Premise #DevOps

2026-01-29 • LocalLLaMA

Qwen3-ASR: Open-Source Models for Multilingual Speech Recognition

The Qwen3-ASR family includes 1.7B and 0.6B parameter models, capable of identifying the language and transcribing audio in 52 languages and dialects. The larger model achieves performance comparable to proprietary commercial APIs, offering a valid o...

#LLM On-Premise #Fine-Tuning #DevOps

2026-01-29 • LocalLLaMA

Mini-LLM: an 80M parameter LLM based on Llama 3 architecture

An engineer has developed Mini-LLM, an 80 million parameter transformer language model from scratch, based on the Llama 3 architecture. The project includes tokenization, memory-mapped data loading, mixed precision training, and inference with KV cac...

#LLM On-Premise #Fine-Tuning #DevOps

2026-01-29 • LocalLLaMA

OpenMOSS unveils MOVA: Open-Source model for video and audio

OpenMOSS has released MOVA (MOSS-Video-and-Audio), a fully open-source model with 18 billion active parameters (MoE architecture, 32 billion total). MOVA offers day-0 support for SGLang-Diffusion and aims at scalable and synchronized video and audio ...

2026-01-28 • LocalLLaMA

LongCat-Flash-Lite: LLM optimized for fast inference

Meituan-Longcat has released LongCat-Flash-Lite, a large language model (LLM) focused on efficient inference. The model is available on Hugging Face and discussed on Reddit, suggesting interest in local inference deployments.

#Hardware #LLM On-Premise #Fine-Tuning

2026-01-28 • TechCrunch AI

AI Model Development and Open Source

Related Coverage

Qwen3-Coder-Next: NVFP4 Quantization Released (45GB)

ACE-Step-1.5: Open-Source Audio Generative Model Released

ACE-Step 1.5: The Open-Source Model Challenging Suno in Music Generation

Qwen3-Coder-Next: new language model for programming

GLM-5: New language model coming in February

MrRoPE: A Unified Approach to Extend LLM Context Window

Step-3.5-Flash: outperforms with fewer parameters

OLMO 3.5: Hybrid Model for Efficient LLM Inference Coming Soon

Falcon-H1-Tiny: Specialized Micro-Models at 90M Parameters

Uncensored LLM Models Available on Hugging Face

Can 4chan data REALLY improve a model? Turns out it can!

NanoChat: Beating GPT-2 for Under $100

g-HOOT: A New Research Paper in the World of AI

GPT-OSS: Why is this open-source model still so good?

Design Arena is now dominated by an open model

LingBot-World: Open Source Dynamic Simulation Outperforms Genie 3

Distilled models: why aren't there more?

Mistral CEO Arthur Mensch: AI access like electricity

Qwen3-ASR: Open-Source Models for Multilingual Speech Recognition

Mini-LLM: an 80M parameter LLM based on Llama 3 architecture

OpenMOSS unveils MOVA: Open-Source model for video and audio

LongCat-Flash-Lite: LLM optimized for fast inference

Arcee AI challenges Meta with a 400B parameter open source LLM

AMA With Kimi: The Open-source Lab Behind K2.5 Model

Kimi K2.5: a promising open-source model for coding