๐ Frameworks
AI generated
LogicLens: Visual-Logical Co-Reasoning for Text-Centric Forgery Analysis
## Introduction
The threat of text-centric forgery poses a significant challenge to information security and authenticity. Current methods for text-centric forgery analysis are often limited to coarse-grained visual analysis and lack the capacity for sophisticated reasoning.
## The LogicLens Framework
To address these challenges, Meta has introduced LogicLens, a unified framework for Visual-Textual Co-reasoning that reformulates these objectives into a single task. This framework is powered by our novel Cross-Cues-aware Chain of Thought (CCT) mechanism, which iteratively validates visual cues against textual logic.
## The PR$^2$ Pipeline
To ensure robust alignment across all tasks, we further propose a weighted multi-task reward function for GRPO-based optimization. Complementing this framework, we first designed the PR$^2$ (Perceiver, Reasoner, Reviewer) pipeline, a hierarchical and iterative multi-agent system that generates high-quality, cognitively-aligned annotations.
## The RealText Dataset
To test LogicLens, we constructed the RealText dataset, comprising 5,397 images with fine-grained annotations, including textual explanations, pixel-level segmentation, and authenticity labels for model training. Extensive experiments demonstrate the superiority of LogicLens across multiple benchmarks.
## Experimental Results
LogicLens surpasses the specialized framework by 41.4% in zero-shot evaluation on T-IC13 and by 23.4% in macro-average F1 score against GPT-4o. On the challenging dense-text T-SROIE dataset, LogicLens establishes a significant lead over other MLLM-based methods in mF1, CSS, and the macro-average F1.
## Conclusion
LogicLens represents a significant step forward in the fight against text-centric forgery and offers new opportunities for information security and authenticity.
๐ฌ Commenti (0)
๐ Accedi o registrati per commentare gli articoli.
Nessun commento ancora. Sii il primo a commentare!