๐ LLM
AI generated
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
# The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition
The LLM composition system is increasingly dependent on model composition techniques that remix capabilities from diverse sources. A new attack discovered on this system may compromise model security.
Researchers have created a "breaker token" that, when transplanted into a base model, can sabotage the model's functionality without altering its utility. This attack introduces a vulnerability in the supply chain and questions LLM model security.
## Attacks and Vulnerabilities
The attack is possible due to the breaker token's ability to be functionally inert in a donor model but reliably reconstructing into a high-salience malicious feature after transplant into a base model. This creates an asymmetric realizability gap that sabotages the base model's generation without altering its utility.
## Formalization and Attacks
Researchers have formalized this attack as a dual-objective optimization problem and instantiated it using a sparse solver. The attack is training-free and achieves spectral mimicry to evade outlier detection, while demonstrating structural persistence against fine-tuning and weight merging.
## Risk in the Supply Chain
This attack questions LLM model security and introduces a vulnerability in the supply chain. It is essential that researchers and developers continue to monitor and improve LLM model security.
## Code Available
The attack code is available on GitHub https://github.com/xz-liu/tokenforge
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!