Response-Based Knowledge Distillation: Multilingual LLM Safety Compromised?
A new study explores knowledge distillation to improve the safety of large language models (LLMs) in multilingual contexts. Results show that fine-tuning on "safe" data can paradoxically increase model vulnerability to jailbreak attacks, highlighting...