## GUI-Eyes: Active Perception for GUI Automation A new study introduces GUI-Eyes, a reinforcement learning framework designed to improve the automation of graphical user interfaces (GUIs). The system stands out for its ability to implement active visual perception, allowing the agent to autonomously decide how and when to use visual tools to analyze the interface. ## Functionality and Innovations GUI-Eyes uses a two-stage reasoning process, which includes a general exploration phase and a more detailed analysis phase. The agent learns to make strategic decisions about the use of tools such as zoom and crop, optimizing its observations. A continuous spatial reward system provides detailed feedback, overcoming the problem of reward sparsity typical of GUI environments. ## Performance and Results In tests on the ScreenSpot-Pro benchmark, GUI-Eyes-3B achieved 44.8% accuracy in identifying elements, using only 3,000 labeled examples. This result significantly outperforms the performance of supervised and reinforcement learning-based baselines, demonstrating the effectiveness of active perception and the strategic use of visual tools. ## Implications GUI-Eyes represents a step forward in the development of robust and efficient AI agents in interacting with GUIs. The ability to learn with a limited amount of data makes this approach particularly interesting for applications where the availability of labeled data is limited.

GUI-Eyes: AI for GUI Automation with Active Perception

💬 Commenti (0)

📚 Approfondimenti

Approfondisci su LLM On-Premise

Caos degli agenti IA: i dati ci salveranno?

L'IA cambierà le app che usiamo? Sviluppatori divisi

L'IA sfida la matematica di alto livello: modelli sempre più abili