A deep dive into how OpenAI’s Codex team builds its coding agent

By YoussoufDelve | Siriandelmec | 20 Mar 2026

More than a million developers use OpenAI’s command-line coding interface every week. Named Codex, usage has increased 5x since the start of January 2026. In the first week of February, OpenAI launched the Codex desktop app, a macOS application that CEO Sam Altman calls “the most loved internal product we’ve ever had”. A few days later, OpenAI shipped GPT-5.3-Codex, which they describe as the first model that helped create itself. This deep dive was realized by the Pragmatic Engineer newsletter.

The deep dive covers :

-) How it started. From an internal experiment in late 2024, to a product used by more than a million devs.

-) Technology and architecture choices. Why Rust and open source ? In-depth on how the agent loop works.

-) How Codex builds itself. Codex itself writes more than 90% of the app’s code, the team estimates.

-) Research. Training the next Codex model with the current one.

1. How it started

In 2024, OpenAI was experimenting with various approaches for building a software agent. That fall, the company declared that building an aSWE (Autonomous Software Engineer) was to be a top-line goal for 2025. This vision came from the top : Greg Brockman and Sam Altman believed they should have an autonomous software engineer working alongside teams. Tibo describes the thinking :

“Greg and Sam had the strong conviction : ‘eventually, we should have an autonomous software engineer working alongside us and with the capabilities seen from o1-preview, the time is now to have a group absolutely dedicated to making this a reality’”.

A number of folks who’d worked on earlier prototypes were pulled into the effort, which featured :

Michael Bolin : tech lead for the Codex open source repository.

Gabriel Peal : who subsequently built the VS Code extension, mostly solo, and built the foundations of the Codex desktop app.

Fouad Matin : who led the initial release of the Codex CLI, and is responsible for Codex’s safety and security approach.

OpenAI had two teams tackle different segments of the problem space : Codex Web would focus on an async, cloud-based solution, while Codex CLI targeted iterative, local development. Both products would launch in the spring, with Codex CLI being announced in April 2025, and Codex in ChatGPT introduced in May.

2. Technology and architecture choices

An obvious difference between Codex and Claude Code is the programming language. Claude Code is written in TypeScript, “on distribution”, which plays to the underlying model’s strengths. Meanwhile, the Codex CLI is written in Rust. Tibo explains why :

“We debated TypeScript, Go, and Rust. All three seemed like solid contenders for different time horizons. In the end, our reasoning came down to a few layers :

Performance : We want to eventually run this agent at a massive scale where every millisecond matters. Performance is also important when running locally in a sandboxed environment.

Correctness : We wanted to choose a language that helps eliminate a class of errors with things like strong typing and memory management.

Engineering culture and engineering quality : There’s this interesting thing that language choice does : it gets you to think about the engineering bar you set. We decided to pick Rust because it’s extremely important for our core agent implementation to be extremely high quality”.

There was also a practical concern about dependencies. Choosing TypeScript means using the npm package manager. Using npm often means building on top of packages that may not be fully understood – which could clearly be problematic. By going with Rust, the team has very few dependencies and can thoroughly look through the few dependencies there are.

They also want to eventually run the Codex agent in all sorts of environments – not just laptops and data centers – and even places like embedded systems. Rust makes this more achievable from a performance perspective than TypeScript or Go.

Tibo tells the Pragmatic Engineer newsletter that while Codex’s early performance was less standout with Rust than with TypeScript, they expected the model to catch up. Plus, choosing Rust gave them one more engineering challenge to work with. The Codex team also hired the maintainer of Ratatui – the Rust library for building terminal user interfaces (TUIs). He’s now full-time on the Codex team, doing open source work.

The core agent and CLI are fully open source on GitHub.

How Codex works

The core loop is a state machine, and the agent loop is the core logic in the Codex CLI. This loop orchestrates the interaction between the user, the model, and the tools the model uses. This “agent loop” is something every AI agent uses, not just Codex, and below is how Codex implements it, at a high level :

A) Prompt assembly : the agent takes user input and prepares the prompt to pass to the model. On top of user input, the prompt includes system instructions (coding standards, rules), a list of available tools (including MCP servers), and the actual input : text, images, files, AGENTS.md contents, and local environment info.

B) the prompt is converted to tokens and fed to the model, which streams back output events : reasoning steps, tool calls, or a response.

C) Response :

- Stream the response to the user by showing it on the terminal.

- If the model decides to use a tool, make this tool call : e.g. read a file, run a bash command, write code. If a command fails, the error message goes back to the model, the model attempts to diagnose the issue, and may decide to retry.

D) Tool response (optional) : if a tool was invoked, return the response to the model. Repeat steps 3 and 4 for as long as more tool calls are needed.

E) the “final message” intended for the user which closes one step in the loop. The loop then starts again with a new user message.

Compaction is an important technique for efficiently running agents. As conversations grow lengthy, the context window fills up. Codex uses a compaction strategy : once the conversation exceeds a certain token count, it calls a special Responses API endpoint, which generates a smaller representation of the conversation history. This smaller version replaces the old input and avoids quadratic inference costs. We covered how self-attention scales quadratically in our 2024 ChatGPT deepdive.

Safety is an important consideration because LLMs are nondeterministic. Codex runs in a sandbox environment that restricts network access and filesystem access by default. Tibo reflects on this choice :

“We take a stance with the sandboxing that hurts us in terms of general adoption. However, we do not want to promote something that could be unsafe by default. As a dev, you can always go into your configuration and disable these settings if you want.

We made this default setting because many of our users are not that technical. We don’t want to give them something that could have unintended consequences”.

There are several releases per week. Internally, the team ships a new version of Codex up to three or four times a day. Externally, new releases are cut every few days and are distributed via package managers, Homebrew, and npm.

Michael Bolin’s recent blog post, “Unrolling the Codex Agent Loop,” lays out the internals of how the agent loop works.

3. How Codex builds itself

More than ninety percent of the Codex app’s code was generated by Codex itself, the team estimates, which happens to be roughly in line with what Anthropic has reported for Claude Code, according to what its creator Boris Cherny told me. Both AI labs share the meta-circularity of using the coding tools to write their own code.

Tibo tells me that a typical engineer on the Codex team runs between four and eight parallel agents, which do any one of a number of tasks :

. implementation

.Code review

. Security review

. Codebase understanding

. Going through plans and summarizing

. Going through what team members have done and summarizing changes

. Bugfixes

… and more.

Codex engineers are now “agent managers” and no longer just write code. Tibo says it’s common for an engineer to walk into the office with several tabs open on their laptop: a code review running in one, a feature being implemented in another, a security audit in a third, and a codebase summary being generated in a different tab. He says :

“Codex is really built for multitasking. There’s this understanding that most tasks will just get done to completion.

People on our team have figured out what Codex is and isn’t capable of. There is a tricky thing in all of this, though : we have to relearn these capabilities with every model”.

4) Frequently-used “skills” ( Research)

“Agent Skills” are ways to extend Codex with task-specific capabilities, which is pretty much the same concept as Claude Code’s skills. Internally, the Codex team built 100+ Skills to share and choose from. Three interesting examples :

Security best-practices skill : a comprehensive write-up of all security practices adopted by the team. When invoked, Codex goes through each practice, checks the code, and generates patches for anything missing.

“Yeet” skill : takes any code change, writes up the PR title and description based on the original plan, and creates a draft PR in one step.

Datadog integration skill : Codex connects to Datadog, reviews alerts and issues, finds problems, and tries to generate a fix for them.

Cryptocurrency Bitcoin (BTC) Blockchain

How do you rate this article?

YoussoufDelve

I am a young boy passionate by the World of cryptocurrencies.

Siriandelmec

I am a crypto Lover who believe that Cryptocurrency is the best innovation of this century and maybe for all the Times. Thank you very much to Satoshi Nakamoto.