A new study introduces MedES, a dynamic benchmark for aligning large language models (LLMs) with Chinese medical ethics. The system uses an automated evaluator to provide structured ethical feedback, improving model performance in complex clinical scenarios. Results show significant improvements over baseline models, paving the way for similar deployments in other legal and cultural contexts.
A new study introduces State-Centric Retrieval, a unified paradigm for Retrieval-Augmented Generation (RAG) that uses "states" to connect embedding models and rerankers. The approach, based on a fine-tuned RWKV model, promises significant improvements in efficiency and speed, reducing computational redundancy and accelerating inference. Experimental results show near-complete performance retention with reduced resource usage.
A new study highlights the challenges of regularization-based continual learning in EEG-based emotion classification. Existing methods show limited performance due to inter- and intra-subject variability, and tend to prioritize mitigating catastrophic forgetting over adapting to new subjects. This limits robust generalization to unseen subjects.
A novel approach to compressing large language models (LLMs) promises to significantly reduce memory requirements and computational resources. The technique, called Hierarchical Sparse Plus Low-Rank (HSS) compression, combines sparsity with low-rank factorization to compress models while maintaining competitive performance. Results show significant memory savings with minimal accuracy loss.
New research addresses the challenge of ensuring that Large Language Models (LLMs) adhere to safety principles without refusing benign requests. The study evaluates the impact of explicitly specifying extensive safety codes versus demonstrating them through illustrative cases, proposing a case-augmented deliberative alignment method (CADA) to enhance the safety and robustness of LLMs.
A new study introduces a hybrid explainable AI (XAI) framework for assessing maternal health risks in resource-constrained settings. The model, validated by clinicians in Bangladesh, combines ante-hoc fuzzy logic with post-hoc SHAP explanations, enhancing trust and clinical adoption. Healthcare access was identified as the primary predictor.
By rolling out ChatGPT Enterprise company-wide, Zenken has boosted sales performance, cut preparation time, and increased proposal success rates. AI-supported workflows are helping a lean team deliver more personalized, effective customer engagement.
US Defense Secretary Pete Hegseth said he plans to integrate Elon Musk's AI tool, Grok, into Pentagon networks later this month. The announcement comes weeks after Grok drew international backlash for generating sexualized images of women and children. Hegseth also rolled out an "AI acceleration strategy" for the Department of Defense.
Anthropic has announced the launch of Anthropic Labs, a new division focused on cutting-edge research and development projects in the field of artificial intelligence. The initiative aims to accelerate innovation and explore new frontiers in the sector.
A consumer watchdog has raised concerns about Google's new Universal Commerce Protocol, arguing it could lead to higher prices for consumers. Google strongly denies these claims, defending the integrity of its system.
OpenAI and Anthropic have recently launched healthcare-focused products. Doctors are interested in adopting AI, but with reservations about using chatbots for patient care. The integration of AI in the medical field opens new perspectives, but requires careful evaluation of risks and benefits.
LangSmith Agent Builder is now generally available, allowing users to create no-code AI agents to automate routine tasks such as research, follow-ups, and updates. Agents can be shared, integrated with other tools, and customized with specific models. Ideal for daily briefings, market research, and project tracking.
LangSmith Agent Builder is now generally available, designed to automate routine tasks. It allows the creation of no-code agents that learn from feedback, aiming to reduce workload and improve operational efficiency. Ideal for briefings, market research, and project management, Agent Builder integrates with existing tools and adapts to team needs, enabling users to share, customize, and extend agent capabilities.
Salesforce has announced Slackbot, a new artificial intelligence-powered agent designed to allow users to complete complex tasks within various enterprise applications directly from Slack. The goal is to simplify workflows and improve productivity by centralizing task execution in a single interface.
Moxie Marlinspike—the pseudonym of an engineer who set a new standard for private messaging with the creation of the Signal Messenger—is now aiming to revolutionize AI chatbots in a similar way. His latest brainchild is Confer, an open source AI assistant that provides strong assurances that user data is unreadable to the platform operator, hackers, law enforcement, or any other party other than account holders. The service runs entirely on open source software that users can cryptographically verify.
A comprehensive study analyzes the lexical diversity and structural complexity of literary and newspaper texts in Bangla. The research, based on the Vacaspati and IndicCorp corpora, examines key linguistic properties and assesses the impact of integrating literary data on natural language processing (NLP) models. The findings highlight greater lexical richness in literary texts and their closer adherence to Zipf's law.
A new study identifies the limitations of current roleplaying models, which struggle to reproduce believable characters. The VEJA (Values, Experiences, Judgments, Abilities) framework proposes a new training method based on manually curated data, achieving superior results compared to systems based on synthetic data. The goal is to create agents capable of simulating complex and realistic human interactions.
A new framework, CrossTrafficLLM, leverages GenAI to predict traffic conditions and generate natural language descriptions. The goal is to provide more effective and understandable decision support for Intelligent Transportation Systems (ITS). The system aligns quantitative traffic data with qualitative descriptions, improving both the accuracy of predictions and the quality of generated reports.
Google has disabled some AI-generated health summaries after an investigation revealed inaccurate and potentially dangerous information. The AI provided inaccurate data on blood test results and misleading recommendations for cancer patients, leading to incorrect conclusions about their health status. The company removed responses to specific queries, but other potentially harmful answers remain accessible.
Anthropic unveiled Claude for Healthcare, about a week after OpenAI announced its ChatGPT Health product. Both companies are moving to bring generative artificial intelligence to the healthcare sector, with the goal of improving the efficiency and accuracy of medical services. This move underscores the growing importance of large language models (LLMs) in clinical and diagnostic settings.