Codex-maxxing: preserving context in long-running work

Context as a scarce resource

Large Language Models are powerful but constrained by their short-term memory. In a simple interaction, a prompt yields a response. But when work demands hundreds of exchanges, the risk of losing the thread grows exponentially. Developer Jason Liu, known for his pragmatic approach to LLMs, coined the term "Codex-maxxing" to describe the intensive use of Codex for managing extended projects. The idea is straightforward: preserve as much context as possible to ensure work continues seamlessly beyond a single prompt.

Codex and the continuity challenge

OpenAI Codex, an evolution of GPT models, offers a context window that can hold tens of thousands of tokens. This allows feeding the model entire conversations, code, and documentation, creating an artificial "working memory." Liu uses techniques like prompt chaining and selective compression to keep relevant project parts alive while discarding noise. It's an approach reminiscent of dynamic fine-tuning, but applied client-side without altering model weights.

When the cloud becomes a bottleneck

Intensive use of a cloud platform like Codex, however, raises economic and technical questions. Every API call has a cost, and for sessions lasting hours or days, the Total Cost of Ownership can become unpredictable. Add to this rate limits, network latency, and total dependence on an external service. For sensitive projects, sending data outside one's control perimeter is never a neutral choice, especially in regulated industries.

The on-premise perspective from AI-RADAR

Those working with LLMs for long cycles are increasingly eyeing self-hosted alternatives. A model running on local hardware, even mid-to-high tier, offers cost predictability, no per-token fees, and full data control. True, the context window of open-source models may be smaller than Codex's, but quantization techniques and architectures like linear attention transformers are closing the gap. AI-RADAR dedicates significant analysis to these trade-offs: on the /llm-onpremise portal you'll find frameworks to assess whether on-premise deployment is sustainable based on workloads, budget, and sovereignty requirements.

Beyond the single prompt

The lesson of "Codex-maxxing" extends beyond the specific tool: for long-running work, the ability to maintain context is a differentiating factor in choosing an LLM. Companies exploring these technologies today must ask not just which model performs best on synthetic benchmarks, but which infrastructure allows managing continuous workflows without surprises. On-premise, with its constraints and opportunities, thus returns to the center of the discussion.