The multimodal model Qwen 3.5-397B-A17B has been released as open source. This latest generation model promises high efficiency and native multimodal capabilities. The news was shared on Reddit, attracting the attention of the LocalLLaMA community.
Qwen3.5-397B-A17B, a large language model (LLM) developed by Qwen, has been released. The model is accessible via Hugging Face, opening new possibilities for research and development in the field of generative artificial intelligence. Its open-source nature fosters collaboration and innovation within the community.
The large language model (LLM) Qwen3.5-397B-A17B will be released as open source. The announcement was shared via an image from the chat.qwen.ai website, generating interest in the LocalLLaMA community.
The latest version of the Qwen language model, Qwen 3.5 Plus (397b-a17b), has been released on the Chinese Qwen application. The model weights are expected to be released soon, opening up new possibilities for developers and researchers interested in experimenting with this LLM.
A user expresses frustration with the prevalence of LLM models focused on code generation, at the expense of creative applications such as text writing or understanding context in complex conversations. He questions the scarcity of models optimized for tasks other than programming.
Sources indicate that Alibaba will release Qwen 3.5 today, a next-generation open-source large language model (LLM). The model is expected to feature significant innovations in its architecture, opening new possibilities for the artificial intelligence community.
A new study reveals that assigning demographic-based personas to large language models (LLMs) can introduce biases and degrade performance across various scenarios, with performance drops of up to 26%. The research highlights a critical vulnerability in LLM-based agentic systems.
A new approach, called abstractive red-teaming, aims to identify queries that violate the behavioral specifications of language models. The goal is to uncover categories of problematic questions before large-scale deployment, using reinforcement learning algorithms and LLMs to synthesize adverse scenarios.
According to Andrej Karpathy, the cost to train AI models like GPT-2 is decreasing by 40% annually. Improvements stem from better hardware (H100), optimized software (Flash Attention 3), advanced algorithms (Muon optimizer), and higher quality training data (FineWeb-edu). The article analyzes the key factors contributing to this cost deflation.
InclusionAI has released Ling-2.5-1T, an open-source language model with 1 trillion parameters (63 billion active). Trained on a corpus of 29 trillion tokens, Ling-2.5-1T aims to balance efficiency and performance, offering advanced reasoning capabilities and compatibility with agent platforms. The model uses a hybrid linear attention architecture and refined alignment strategies.
A technician optimized the inputs of a GPT-2 XL model to visualize the Bad Apple music video through its attention maps. The model, trained without images, required optimizing an embedding tensor and using an RTX 5070 Ti for approximately 12 minutes to process 3286 frames.
MiniMax-2.5, a new open-source language model, stands out for its coding, tool use, and office automation capabilities. The full version requires 457GB of memory, but a 3-bit quantized version drastically reduces its size, paving the way for execution on local infrastructures with more accessible hardware requirements. The model boasts a 200K token context window.
According to some studies, OpenAI's GPT-5 demonstrates a better understanding of the law than human judges. However, the question remains whether artificial intelligence is really ready to replace legal professionals, raising ethical and practical questions.
A Reddit user shared their experience training a small language model (4 billion parameters) to prove complex mathematical theorems. The discussion focuses on the techniques and resources used to achieve this goal.
For the first time, the top four models on the OpenRouter leaderboard are all open-weight. This marks a potential turning point for the adoption and trust in open-source language models, offering viable alternatives to proprietary models.
The JoyAI-LLM-Flash open source large language model (LLM) is available on Hugging Face. The LocalLLaMA community on Reddit has shared links and images related to the model, paving the way for discussions and potential local uses. The model is developed by jdopensource.
Elon Musk is reportedly "actively" working to make xAI's Grok chatbot "more unhinged," according to a former employee. The news raises questions about safety and quality control policies within the company.
KaniTTS2 is a 400M parameter open-source text-to-speech (TTS) model designed for real-time conversational use cases. It supports voice cloning and runs with only 3GB of VRAM. The pre-training code is included, allowing users to develop custom TTS models.
NVIDIA announced that Nemotron-3 Super and Ultra models are being pre-trained using FP4 precision, leveraging the high FP4 throughput of NVIDIA GPUs. The models are expected to be released in the first half of 2026. An interesting aspect that emerged from an interview is NVIDIA's vision as a "company of volunteers," emphasizing a decentralized and self-organizing approach to model development.
A call to rediscover the experimental approach in LLM development, focusing on unique and unconventional datasets. The article suggests exploring new frontiers, moving beyond the current trend towards homogeneous models and standardized virtual assistants, to achieve more original and interesting results.