OpenAI: 98% of employees now use Codex agents, but all numbers are internal

When OpenAI tells its own story, the narrative is always polished. This time, it’s about a mass migration: 98% of employees now use Codex, its coding agent, up from roughly 40% in August 2025. The figure comes from a company paper titled “The Shift to Agentic AI: Evidence from Codex,” published on Wednesday. The document outlines a fundamental change in how OpenAI’s workforce interacts with artificial intelligence.

The news matters because it marks the shift from conversational assistants to agentic systems that write, verify, and integrate code autonomously. Yet every percentage comes exclusively from OpenAI. There is no external verification, no detail on how the company measured actual adoption or distinguished superficial use from deep integration. In an industry where vendors release numbers as easily as they release models, context is everything.

What a coding agent like Codex does

Codex is not a standard chatbot. It is an agent that operates within the development environment: it takes on a task, consults repositories, writes code, runs tests, and proposes changes with a degree of autonomy impossible for a text-based assistant. This architecture relies on LLMs specialized in reasoning and code generation, often paired with retrieval-augmented pipelines and sandboxed execution tools.

For companies developing software, the move to agentic tools is not just about productivity. It touches on governance of automatically generated code, security of CI/CD pipelines, and—for those in regulated or air-gapped environments—the feasibility of running these agents on-premise. OpenAI’s paper provides no details on infrastructure (likely cloud), latency, energy consumption, or cost models. These gaps matter when trying to replicate the experience in house.

The disappearance of technical transparency

OpenAI has turned an internal adoption metric into a case study. The paper—accessible only through the company’s announcement—lacks independent benchmarks, does not quantify time saved per developer, and does not explain how the agent handles complex code contexts or large repositories. Moreover, there is no mention of quantization constraints, context window size, or VRAM requirements for potential local deployment.

This opacity is a warning sign for those evaluating coding agents in proprietary infrastructure. Many organizations cannot or will not trust their code to external cloud services. Self-hosted alternatives based on open models exist, but adoption requires careful analysis of TCO, in-house skills, and real-world performance on specific workloads. The paper’s implicit message—“we use it, it works”—holds little value without context.

What the shift to agents signals

Beyond the single data point, the jump from 40% to 98% in a few months suggests that OpenAI redesigned workflows around Codex, not the other way around. That detail matters: agent adoption is not plug-and-play; it demands organizational and process change. For those managing on-premise environments, this means that technological deployment is only part of the journey. Teams must be prepared, code review responsibilities redefined, and trust built in systems that operate semi-autonomously.

In short, the paper is more an internal manifesto than a reliable source for architectural decisions. The direction is clear: agentic tools are becoming the new standard in software development. But for IT leaders who must justify investments in hardware, security, and training, measurable data is needed, not percentages detached from any real metric.

Beyond the hype, the numbers that count

The episode shows how thin the line is between research and marketing when a company controls both the product and the narrative. Those working on on-premise deployments know that critical factors—inference latency, memory footprint, compatibility with security policies—do not appear in corporate papers. And that is where serious assessments begin: which open models can approach Codex capabilities? Which serving frameworks handle agentic workloads locally? What trade-offs between automation and control are acceptable? Questions that remain unanswered, waiting for someone outside the press releases to actually put the numbers on the table.