๐ LLM
AI generated
The Forgotten Shield: Safety Grafting in Parameter-Space for Medical MLLMs
## Medical LLMs: The Safety Challenge
Medical Multimodal Large Language Models (MLLMs) have achieved remarkable progress, but research into their safety has lagged, posing potential risks for real-world deployment.
A new study systematically benchmarked the safety of current state-of-the-art Medical MLLMs, revealing pervasive vulnerabilities across both general and medical-specific safety dimensions. The fragility of these models against cross-modality jailbreak attacks was particularly highlighted.
## Parameter-Space Intervention
The research found that the medical fine-tuning process frequently induces catastrophic forgetting of the model's original safety alignment. To address this challenge, a novel "Parameter-Space Intervention" approach was proposed for efficient safety re-alignment.
This method extracts intrinsic safety knowledge representations from original base models and concurrently injects them into the target model during the construction of medical capabilities. A fine-grained parameter search algorithm was also designed to achieve an optimal trade-off between safety and medical performance.
Experimental results demonstrate that this approach significantly bolsters the safety guardrails of Medical MLLMs without relying on additional domain-specific safety data, while minimizing degradation to core medical performance.
## General Context
The safety of LLMs is an increasingly central theme, especially in sensitive sectors such as healthcare. Attacks, particularly jailbreak attacks, aim to circumvent the protection mechanisms built into the models, inducing them to generate inappropriate or harmful responses. The development of techniques to mitigate these risks is fundamental for a responsible and reliable deployment of these technologies.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!