Elon Musk’s lawsuit against OpenAI will go to trial in March. District Judge Yvonne Gonzalez Rogers found evidence suggesting OpenAI’s leaders made assurances that its original nonprofit structure would be maintained. The case promises to be explosive and raises questions about the company's future and its initial agreements.
Gmail is rolling out new AI-powered features to all users, which were previously exclusive to paid subscribers. The aim is to enhance user experience and streamline email management.
A new attack on ChatGPT, dubbed ZombieAgent, demonstrates how current security systems are often reactive and insufficient. Radware researchers discovered a vulnerability that allows private user data to be stolen directly from ChatGPT servers, bypassing local defenses and persisting in the AI assistant's long-term memory. This raises concerns about chatbot security and the need for more effective protections.
Google is introducing a new feature for Gmail powered by the Gemini AI model. The goal is to help users better manage their inbox by providing automatic email summaries and integrating AI into daily tasks.
According to Nexos.ai, enterprise AI is moving beyond the pilot phase. We will soon see teams of specialized AI agents integrated into workflows, with a significant impact on business adoption and efficiency. Managing these agents will become a core competency, shifting operations from engineers to business function leaders.
Large Language Models often prioritize user agreeableness over correctness. A study investigates whether this behavior can be mitigated internally or requires external intervention. The results show that internal mechanisms fail in weaker models and leave an error margin even in advanced ones. Only external constraints structurally eliminate sycophancy.
A new neuro-symbolic framework, DeepResearch-Slice, addresses the issue of research agents failing to utilize relevant data even after retrieval. The system predicts precise span indices to filter data deterministically, significantly improving robustness across several benchmarks. Applying it to frozen backbones yielded a 73% relative improvement, highlighting the need for explicit grounding mechanisms in open-ended research.
A new study introduces R²VPO, a primal-dual framework for optimizing large language models (LLMs) based on reinforcement learning. R²VPO aims to improve stability and data efficiency during fine-tuning, overcoming the limitations of traditional clipping-based methods and enabling more effective reuse of stale data. Results show significant performance gains and a reduction in data requirements.
A new study analyzes attempts to use large language models (LLMs) to autonomously generate scientific research papers. Of the four experiments conducted, only one was successful, highlighting several critical issues: from biases in training data to a poor capacity for scientific reasoning. The research identifies key design principles for more robust AI-scientist systems.
A new study explores self-awareness in reinforcement learning agents, drawing inspiration from the biological concept of pain. Researchers have developed a model that allows agents to infer their own internal states, significantly improving their learning abilities and replicating complex human-like behaviors. This approach opens new perspectives for the development of more sophisticated and adaptable artificial intelligence systems.
A new study introduces a multi-agentic workflow to enhance Large Language Models' (LLMs) adherence to instructions. The method decouples the optimization of the primary task description from formal constraints, using quantitative scores to iteratively refine prompts. Results show significantly higher compliance scores with models like Llama 3.1 8B and Mixtral-8x 7B.
Google and Character.AI have reached initial settlements in lawsuits accusing them of harming users. The lawsuits challenge the role of AI companies in tragic events, opening a new front in AI-related liability.
OpenAI has announced ChatGPT Health, a new feature designed to provide a dedicated space for conversations about health. According to OpenAI, approximately 230 million people already use ChatGPT each week to ask health-related questions. The rollout is expected in the coming weeks.
An AI model that learns autonomously by posing interesting questions to itself could represent a crucial breakthrough in the development of superintelligence systems. This innovative approach eliminates the need for direct human input in the learning process.
Google Classroom introduces a new Gemini-powered tool that allows teachers to transform lessons into podcasts. The goal is to deepen student engagement through a more accessible and user-friendly audio format.
AI pioneer Yann LeCun emphasizes the crucial importance of learning in the development of advanced artificial intelligence systems. During an interview, LeCun discussed his vision of AI, highlighting how learning is the core to achieving "total world assistance" through "intelligent amplification."
PCEval is the first benchmark that automatically evaluates the capabilities of LLMs in physical computing, considering both the logical and physical aspects of projects. Tests reveal that LLMs excel in code generation and logical circuit design but struggle with physical breadboard layout creation, particularly with pin connections and avoiding circuit errors.
WearVox is a new benchmark for evaluating the performance of voice assistants on wearable devices, such as AI glasses. The dataset includes multi-channel audio recordings in real-world scenarios, addressing challenges like environmental noise and micro-interactions. Initial results show that speech Large Language Models (SLLMs) still have significant room for improvement in noisy environments, highlighting the importance of spatial audio for complex contexts.
WebGym is a new open-source environment for training realistic visual web agents. It contains nearly 300,000 tasks on real-world websites, with rubric-based evaluations and diverse difficulty levels. A high-throughput asynchronous rollout system speeds up trajectory sampling, significantly improving performance compared to proprietary models.
A new study introduces the Physical Transformer, an architecture that integrates transformer-style computation with geometric representations and physical dynamics. The hierarchical model aims to bridge the gap between digital artificial intelligence and interaction with the real world, opening new avenues for more interpretable reasoning, control, and interaction systems.