Autonomous Discovery of Geoscientific Data with PANGAEA-GPT

The rapid accumulation of Earth science data presents significant challenges in terms of scalability and effective utilization. Many datasets remain underutilized, limiting data reusability.

PANGAEA-GPT is a hierarchical multi-agent framework designed to address this issue. Unlike standard Large Language Model (LLM) wrappers, PANGAEA-GPT implements a centralized Supervisor-Worker topology. This architecture allows for precise data routing, deterministic code execution in sandboxed environments, and a self-correction mechanism based on execution feedback. Agents can autonomously diagnose and resolve runtime errors.

Through use-case scenarios in physical oceanography and ecology, the system demonstrates the ability to execute complex, multi-step workflows with minimal human intervention. The framework provides a methodology for querying and analyzing heterogeneous data through coordinated agent workflows.