Google has released Gemini 3.1 Flash TTS, a new AI-powered speech synthesis model, now available across its products. This technology aims to generate more natural and expressive AI speech, a crucial aspect for enterprise applications requiring realistic voice interactions. The introduction of such capabilities raises questions about infrastructure requirements for on-premise deployments, contrasting with cloud solutions.
Gizmo, a London-based AI-powered learning platform, has secured $22 million in Series A funding. The capital, led by Shine Capital, will support international expansion and technological development. Founded by Cambridge graduates, Gizmo aims to revolutionize education by transforming content into personalized and gamified study materials, leveraging engagement techniques common in consumer technology. The platform serves over 13 million learners across more than 120 countries.
Adobe has announced a new artificial intelligence-powered assistant, named Firefly. This tool is designed to operate across various Creative Cloud applications, including Photoshop, Premiere, Lightroom, Express, and Illustrator, with the aim of automating and simplifying task execution for users. The initiative seeks to deeply integrate AI capabilities into professional creative workflows.
Self-Distillation Zero (SD-Zero) introduces an innovative method for post-training LLMs, overcoming the limitations of sparse binary rewards and the reliance on external teachers or high-quality data. SD-Zero enables a single model to generate and revise its own responses, transforming binary rewards into dense token-level supervision. This approach improves performance by at least 10% on math and code reasoning benchmarks, offering greater efficiency in training sample usage and reducing operational costs for self-hosted deployments.
A new study introduces the Filtered Reasoning Score (FRS), an innovative metric designed to evaluate the reasoning quality of Large Language Models (LLMs) beyond mere accuracy. FRS analyzes a model's most confident reasoning traces, revealing significant differences even among models with similar performance. This approach promises to identify transferable reasoning capabilities, crucial for robust and reliable deployments.
New research introduces Schema-Adaptive Tabular Representation Learning (SATRL), a method leveraging Large Language Models (LLMs) to overcome schema generalization limitations in tabular data, especially in clinical settings. By transforming structured variables into natural language statements, SATRL creates transferable embeddings enabling zero-shot alignment with unseen schemas, eliminating the need for manual feature engineering or retraining. The approach demonstrated state-of-the-art performance in dementia diagnosis, outperforming neurologists in retrospective diagnostic tasks.
The GoodPoint project introduces a novel approach to generating constructive feedback for scientific papers using Large Language Models. Through a curated dataset and an innovative training recipe, GoodPoint significantly improves feedback quality, outperforming models of similar size and even Gemini-3-flash in precision. The goal is to augment researchers, not replace them, by providing tools to enhance research and its presentation.
A new study offers a novel perspective on the trajectory of scientific discovery, analyzing it as an optimization problem. The paper argues that the current body of scientific knowledge represents a "local optimum" rather than a global one, influenced by historical contingencies, cognitive path dependence, and institutional lock-in. Drawing an analogy to gradient descent in machine learning, the authors explore how science might miss superior descriptions of nature, identifying lock-in mechanisms and proposing strategies to overcome them.
Anthropic has introduced new code routines for its Claude LLM, enabling users to automate specific tasks without relying on autonomous agent software. This update is accompanied by a redesign of the Claude application, aimed at improving user experience and facilitating the integration of automated processes. The goal is to offer greater control and flexibility in orchestrating LLM-based workflows.
Anthropic has pre-released its Mythos model to selected partners, highlighting its cybersecurity capabilities. The UK government's AI Security Institute (AISI) conducted an independent evaluation, confirming Mythos's excellence in orchestrating complex multi-stage attacks, while not significantly distinguishing itself in individual tasks. This ability to chain actions represents a notable evolution for Large Language Models in the cybersecurity landscape.
Research suggests that while Anthropic's Claude Mythos may excel in cybersecurity, less expensive models can offer similar performance. The analysis also raises questions about the uptime and reliability of frontier models, highlighting critical trade-offs for enterprises evaluating on-premise AI solutions.
Google has announced the introduction of "Skills" in the Chrome browser, a feature designed to simplify interaction with Gemini. These "Skills" allow users to save and reuse chatbot prompts with a single click, eliminating the need to manually re-enter instructions for recurring tasks. The novelty aims to further integrate Google's AI tools into the browsing experience, ensuring quick access and synchronization across devices.
Google has announced the introduction of "Skills" functionality in the Chrome browser, allowing users to save and reuse personalized AI prompts across various web platforms. This new feature builds on Gemini's browser integration and aims to simplify the management of AI-driven workflows, offering a more efficient approach to interacting with Large Language Models.
Meta is reportedly developing an AI clone of Mark Zuckerberg, a 3D photorealistic avatar capable of interacting with employees. The news, reported by internal sources, highlights the growing interest in personalized artificial intelligence applications. This type of project raises questions about the necessary infrastructure, from computing power for real-time rendering to sensitive data management, crucial topics for companies evaluating on-premise deployment of advanced AI solutions.
Meta is creating a photorealistic AI version of Mark Zuckerberg, trained on his mannerisms, tone, and strategic thinking. This digital character, which Zuckerberg himself is testing, is intended for employees and is distinct from another AI agent handling direct tasks for the CEO. The project aims to enhance internal communication, raising questions about deployment and data sovereignty.
Recent research explores the effectiveness of large-scale unlabelled web data and synthetic annotations generated by LLMs for multilingual hate speech detection. The study demonstrates that continued pre-training of BERT models and the use of open-source LLM ensembles, such as Llama3.2-1B and Qwen2.5-14B, can significantly improve performance, especially for smaller models and low-resource languages, offering insights for efficient on-premise deployments.
Research introduces SECL, a test-time training pipeline addressing LLM overconfidence. By leveraging an internal calibration signal, SECL reduces Expected Calibration Error (ECE) by 56-78% without labeled data or human supervision, adapting to distribution shifts with reduced inference costs. A step forward for model reliability in self-hosted environments.
Despite the widespread adoption of AI devices in medicine, formal equity assessments of models remain rare. Research analyzed 18 open-source brain tumor segmentation models, highlighting how patient-related factors influence performance more than model architecture. No model offers formal fairness guarantees. Fairboard, an open-source dashboard for equitable monitoring of medical imaging models, is now introduced.
New research reveals a unifying mathematical link between Transformer attention mechanisms, diffusion maps, and magnetic Laplacians. These approaches, usually treated as distinct, are presented as different manifestations of a single Markov geometry, offering an integrated perspective on the dynamics of artificial intelligence models.
LABBench2 emerges as an evolution of the LAB-Bench benchmark, designed to measure the real-world capabilities of artificial intelligence systems in scientific research, particularly in biology. With nearly 1,900 tasks, it offers more realistic contexts and significantly greater difficulty than its predecessor, highlighting ample room for improvement for current models. This tool aims to guide the development of more effective AI solutions for fundamental research tasks, providing a public dataset and evaluation harness.