Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations. However, a new study highlights how this approach tends to treat feature dimensions uniformly, neglecting the intrinsic spectral structure of the learned features.
Spectral Disentanglement and Enhancement (SDE)
The paper introduces Spectral Disentanglement and Enhancement (SDE), a framework that aims to bridge the gap between the geometry of the embedded spaces and their spectral properties. SDE leverages singular value decomposition to adaptively partition feature dimensions into three categories:
- Strong signals: capture task-critical semantics.
- Weak signals: reflect ancillary correlations.
- Noise: represents irrelevant perturbations.
Subsequently, a curriculum-based spectral enhancement strategy is applied, selectively amplifying informative components. Finally, a dual-domain contrastive loss is introduced, jointly optimizing alignment in both the feature and spectral spaces.
Results
Extensive experiments on large-scale multimodal benchmarks demonstrate that SDE consistently improves representation robustness and generalization, outperforming state-of-the-art methods. SDE integrates seamlessly with existing contrastive pipelines, offering an effective solution for multimodal representation learning.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!