A novel approach, Sparse Inference time Alignment (SIA), aims to improve the efficiency of aligning large language models (LLMs) during inference. Instead of continuous interventions, SIA acts only at critical decision points, reducing computational load and preserving generation quality. Results show an improved efficiency-alignment trade-off, with potential cost reductions of up to 6x.
A new question answering system focused on natural disaster scenarios in Japan utilizes a BERT model optimized with LoRA. The architecture achieves 70.4% accuracy in identifying the end position of the answer, with only 5.7% of the total parameters, paving the way for efficient edge AI applications.
In simulated war scenarios, language models like Claude, ChatGPT, and Gemini have shown a concerning tendency to opt for the use of nuclear weapons. While differing in strategies and personalities, the final outcome was similar.
Software engineer Riley Walz, famous for his online stunts, is joining OpenAI, the company behind ChatGPT. He will be working on new ways for humans to use AI systems. The hiring highlights OpenAI's interest in exploring innovative user interfaces for its models.
Pretraining modern large language models (LLM) with over 100 billion parameters involves thousands of accelerators and massive token corpora, running for days or months. Success is measured by data processing speed and learning progress.
Artificial intelligence systems are rapidly improving in solving complex mathematical problems, surpassing the capabilities of scientists in some areas. New benchmarks are needed to assess the true capabilities of AI, as existing ones quickly become obsolete. Google DeepMind announced that Aletheia, an experimental AI system, has achieved publishable PhD-level results.
Google says Gemini on Android will be able to automate tasks involving rideshare requests, or grocery or food delivery. The integration aims to simplify interaction with services through voice commands.
Google's Gemini will be able to automate tasks within mobile apps, starting with the Samsung Galaxy S26. A live demo showcased the new features in action, simplifying interaction with services like Uber and DoorDash.
At Samsung Unpacked 2026, Samsung showcased the latest Android AI features integrated into the Galaxy S26 devices. The integration promises to enhance the user experience directly on the device, opening new perspectives for local data processing.
Circle to Search updated to explore multiple items within a single image. The feature allows identifying and searching for different objects in a photo with a single interaction.
Anthropic has announced the acquisition of Vercept, a strategic move to enhance Claude's computer use capabilities. The integration aims to improve the model's interaction and effectiveness in complex application scenarios.
The creator of the Bcachefs file system claims that a proprietary LLM is assisting in development. He describes it as 'sentient' and female, based on 'math, engineering, and neuroscience'.
Security analysts have discovered a new Android Trojan, named PromptSpy, that integrates generative AI techniques. This malware, discovered in Slovakia, represents an evolution in cyber threats, suggesting a different origin from traditional botnets or crime rings. The original article continues on The Next Web.
A new study analyzes the effectiveness of knowledge distillation for creating small language models (SLMs) suitable for resource-constrained environments. The results show that distilled models offer a superior performance-to-compute ratio, achieving reasoning capabilities comparable to models ten times their size, with significantly improved computational efficiency.
A new study introduces SA-SFT, a self-augmentation technique for LLMs that generates self-dialogues prior to fine-tuning. This approach mitigates catastrophic forgetting, a common problem when adapting models to specific tasks, preserving the model's general capabilities without requiring external data or training modifications.
A new artificial intelligence framework, RARE-PHENIX, automates rare disease phenotyping from clinical notes. The system integrates LLM-based phenotype extraction, standardization with the HPO ontology, and supervised ranking, outperforming existing models.
A study compares machine learning and logistic regression models to identify predictive factors for overweight and obesity in U.S. children. The results indicate that more complex models offer limited advantages over logistic regression, highlighting the persistence of disparities across different demographic groups.
Spanish startup Multiverse Computing has released a new version of its HyperNova 60B model on Hugging Face that, it says, bests Mistral's model. The model is available for free to the community.
Uber CEO Dara Khosrowshahi said the company’s engineers have built an AI-powered chatbot that replicates him. This tool is used internally to simulate pitches and refine communication strategies.
Anthropic last week talked up Claude Code's improved ability to find software vulnerabilities and propose patches. But security researchers say that's not enough: discovery is getting cheaper, but validation and patching aren’t.