CORPGEN: AI agents for real-world multitasking

Microsoft has introduced CORPGEN, a framework designed to equip AI agents with the memory, planning, and learning capabilities needed to operate in complex work environments where multitasking is the norm.

Multi-Horizon Task Environments (MHTEs)

To evaluate agent performance in realistic scenarios, Multi-Horizon Task Environments (MHTEs) were developed. In these environments, agents must manage multiple complex tasks simultaneously, each requiring between 10 and 30 dependent steps, within a five-hour session. Tests on existing AI agents revealed weaknesses in memory management, interference between tasks, and difficulty in managing dependencies.

CORPGEN's Architecture

CORPGEN introduces "digital employees," LLM-powered AI agents with persistent identities, role-specific expertise, and realistic work schedules. These agents operate Microsoft Office applications through GUI automation and maintain high performance within MHTEs for hours of continuous activity. CORPGEN's architecture includes hierarchical planning, isolated sub-agents, a tiered memory system, and adaptive summarization.

Collaboration between agents

In an environment with multiple digital employees, collaboration occurs through standard communication channels such as email and Microsoft Teams, without predefined coordination rules. This approach simulates real-world workplace interactions.

Evaluating CORPGEN

Evaluations have shown that CORPGEN maintains or improves task completion rates as workload increases, significantly outperforming baseline systems. Experiential learning contributed most to the improvements, allowing agents to reuse previous success patterns. For those evaluating on-premise deployments, there are trade-offs to consider carefully, as highlighted by AI-RADAR's analytical frameworks on /llm-onpremise.

Future Perspectives

The results suggest that memory management and information retrieval are crucial for the effectiveness of AI agents in the real world. Next steps include testing the ability of agents to maintain memory over multiple workdays and to coordinate in teams.