A reading · from inside the lab

How OpenAI
uses Codex

Less a manifesto than a field report: where a coding agent actually earns its keep across a large engineering org, told through the use cases teams reach for and the habits they’ve built around it.

Merged +217 −196

Original report

OpenAI

“How OpenAI uses Codex”
Research preview · 2026

Reformatted for this library

A Reading · No. 06

Completes the Codex trilogy — see Nos. 04 & 05
One file · inline CSS · inline SVG

Introduction

Codex is used daily across numerous technical teams at OpenAI — accelerating a range of engineering tasks, from understanding complex systems and refactoring large codebases to shipping new features and resolving incidents under tight deadlines.

Drawing on interviews with engineers and internal usage data, this report compiles the use cases and best practices that show how Codex helps teams move faster, improve quality, and manage complexity at scale.

Used daily across

Security Product Engineering Frontend API Infrastructure Performance Engineering

Seven use cases

01
Code understanding
Get up to speed
02
Refactoring & migrations
Span many files
03
Performance optimization
Find the hot path
04
Improving test coverage
Cover the edges
05
Increasing velocity
Start & finish
06
Staying in flow
Pause & resume
07
Exploration & ideation
Open-ended work

The use cases · what teams reach for

Onboard · debug · investigate

Code understanding

Codex helps teams get up to speed quickly in unfamiliar parts of the codebase — locating the core logic of a feature, mapping relationships between services or modules, and tracing data flow through a system.

It surfaces architecture patterns and missing documentation that would otherwise take significant manual effort to reconstruct. During incident response, it lets engineers ramp into new areas fast by surfacing how components interact and how failure states propagate.

From our teams

When I fix a bug, I use Ask mode to see where else in the codebase the same issue might appear.

Performance Engineer · Retrieval Systems

When I’m on-call, I paste the stack trace and ask Codex where the auth flow lives. It jumps straight to the right files so I can triage fast.

Site Reliability Engineer · API Platform

Codex answers my ‘Where would I do this?’ repo questions across Terraform and Python way faster than grep.

DevOps Engineer · Infrastructure Services

Try these prompts

prompt sheetcode understanding

›Where is the authentication logic implemented in this repo?

›Summarize how requests flow through this service from entrypoint to response.

›Which modules interact with [module name] and how are failures handled?

Across files · packages · deps

Refactoring & migrations

Codex is commonly used for changes that span multiple files or packages — updating an API, changing how a pattern is implemented, migrating to a new dependency — applying changes consistently across dozens of files.

It’s especially useful when the update requires awareness of structure and dependencies that a regex or find-and-replace would miss, and for cleanup: breaking up oversized modules, replacing old patterns with modern ones, preparing code for testability.

From our teams

Codex swapped every legacy getUserById() for our new service pattern and opened the PR. It did in minutes what would’ve taken hours.

Backend Engineer · ChatGPT Web

To clear launch blockers, I have Codex scan for every instance of the old pattern, summarize the impact in Markdown, then open PRs with the fixes.

Product Engineer · ChatGPT Enterprise

Try these prompts

prompt sheetrefactoring

›Split this file into separate modules by concern and generate tests for each one.

›Convert all callback-based database access to async/await.

Bottlenecks · tech debt

Performance optimization

During tuning and reliability work, engineers prompt Codex to analyze slow or memory-intensive paths — inefficient loops, redundant operations, costly queries — and suggest optimized alternatives, often with meaningful efficiency and reliability gains.

It also supports code health by flagging risky or deprecated patterns still in active use, helping reduce long-term tech debt and head off regressions.

From our teams

I use Codex to scan for repeated expensive DB calls. It’s great at flagging hot paths and drafting batched queries I can later tune.

Infrastructure Engineer · API Reliability

Codex is great for spotting performance issues quickly — I save 30 minutes of work by spending 5 minutes on a prompt.

Platform Engineer · Model Serving

Try these prompts

prompt sheetperformance

›Optimize this loop for memory efficiency and explain why your version is faster.

›Find repeated expensive operations in this request handler and suggest caching opportunities.

›Suggest a faster way to batch DB queries in this function.

Edge cases · boundaries

Improving test coverage

Codex helps engineers write tests faster, especially where coverage is thin or missing. On a fix or refactor, it suggests tests for edge cases and likely failure paths; for new code, it generates unit or integration tests from the signature and surrounding logic.

It’s particularly good at boundary conditions — empty inputs, max length, unusual-but-valid states — the cases most often missed in a first pass.

From our teams

I point Codex at low-coverage modules overnight and wake up to runnable unit-test PRs.

Frontend Engineer · ChatGPT Desktop

When switching mono-repo branches is painful, I have Codex write the tests and kick off CI while I keep working on my branch.

Backend Engineer · Payments & Billing

Try these prompts

prompt sheettest coverage

›Write unit tests for this function, including edge cases and failure paths.

›Generate a property-based test for this sorting utility.

›Extend this test file to cover missing scenarios around null inputs and invalid states.

Scaffold · last mile

Increasing velocity

Codex accelerates both ends of the development cycle. At kickoff, engineers use it to scaffold boilerplate — folders, modules, API stubs — to get runnable code up without hand-wiring every piece.

Near release, it handles the small-but-essential work: triaging bugs, filling last-mile gaps, generating rollout scripts, telemetry hooks, and config files. It also turns product feedback into starter code — paste a request or spec, get a rough draft to refine.

From our teams

I was in meetings all day and still merged 4 PRs because Codex was working in the background.

Product Engineer · ChatGPT Enterprise

Codex helped ship 3–4 low-priority fixes perfectly that would’ve languished in the backlog, which was super empowering.

Full-Stack Engineer · Internal Tools

Try these prompts

prompt sheetvelocity

›Scaffold a new API route for POST /events with basic validation and logging.

›Generate a telemetry hook for the new onboarding flow, using this template [insert template].

›Create a stub implementation based on this spec: [insert spec or feedback].

Fragmented schedules

Staying in flow

Codex keeps engineers productive when schedules are fragmented and full of interruptions. It captures unfinished work, turns notes into working prototypes, and spins off exploratory tasks to revisit later.

That makes it easier to pause and resume without losing context — especially while on call or stacked with meetings.

From our teams

If I spot a drive-by fix, I fire a Codex task instead of swapping branches and review its PR when I’m free.

Backend Engineer · ChatGPT API

I routinely forward Slack threads, Datadog traces, issues and more to Codex so I can stay focused on high-priority work.

API Engineer · Infrastructure Observability

Try these prompts

prompt sheetstaying in flow

›Generate a plan to refactor this service and split it into smaller modules.

›Stub out the retry logic and add a TODO — I’ll fill in the backoff logic later.

›Summarize this file so I can pick up where I left off tomorrow.

Tradeoffs · related bugs

Exploration & ideation

Codex is useful for open-ended work: finding alternative solutions, validating design decisions, exploring unfamiliar patterns, and pressure-testing assumptions. This surfaces tradeoffs, expands design options, and sharpens implementation choices.

It’s also used to find related bugs — given a known issue or deprecated method, Codex identifies similar patterns elsewhere, making regressions and cleanup easier to catch.

From our teams

Codex helps me solve the cold-start problem — I paste a spec and docs and it scaffolds code or shows me what I forgot.

Product Engineer · ChatGPT Desktop

After I fix a bug I ask Codex where similar bugs might lurk, then spin follow-up tasks.

Performance Engineer · Retrieval Systems

Try these prompts

prompt sheetexploration

›How would this work if the system were event-driven instead of request/response?

›Find all modules that manually build SQL strings instead of using our query builder.

›Rewrite this in a more functional style — avoid mutation and side effects.

I was in meetings all day and still merged 4 PRs because Codex was working in the background. — Product Engineer · ChatGPT Enterprise · the velocity, in one line

Start with Ask Mode

For large changes, prompt for an implementation plan in Ask mode first, then feed that plan into Code Mode. The two-step flow keeps Codex grounded and avoids errors. It works best on well-scoped tasks — roughly an hour of work, or a few hundred lines. Expect that ceiling to rise as models improve.

Iterate on the environment

Setting a startup script, environment variables, and internet access significantly reduces Codex’s error rate. As you run tasks, fold recurring build errors back into the environment config. It takes a few iterations, but the long-run efficiency gains are large.

Prompt like a GitHub Issue

Codex responds better when prompts mirror how you’d describe a change in a PR or issue: include file paths, component names, diffs, and doc snippets when relevant. Patterns like “implement this the same way it’s done in [module X]” improve results.

Queue as a backlog

Fire off tasks to capture tangential ideas, partial work, or incidental fixes. There’s no pressure to produce a full PR in one go — the task queue works as a staging area you return to when you’re back in focus.

Use AGENTS.md for context

Maintain an AGENTS.md so Codex operates effectively across prompts. These files carry naming conventions, business logic, known quirks, and dependencies Codex can’t infer from the code alone.

Leverage Best-of-N

Best-of-N generates multiple responses for a single task at once — explore several solutions and pick the strongest. For harder tasks, review iterations and combine parts of different responses into one better result.

Looking ahead

Still a research preview —
already real impact.

Codex is helping teams move faster, write better code, and take on work that would otherwise never have been prioritized. As the models improve and Codex integrates more deeply into everyday workflows, the expectation is more powerful ways to build software — and more learnings shared along the way.

— OpenAI

Research preview · 2026

Editor’s note

The third panel of the Codex triptych.

Read together, the three Codex readings in this library answer different questions. Liu’s Getting the Most Out of Codex (codex-getting-the-most.html, No. 05) is the capability map — what the system can reach when it leaves the repo. Hayduk’s goal-mode piece (codex-goals.html, No. 04) is the deep dive into one capability. This report is the field evidence: which of those capabilities a large org actually reaches for, in the words of the engineers using them.

The seams are worth noticing. The task queue here is Liu’s queuing seen from the floor — “fire off a task, review the PR when I’m free.” And AGENTS.md, listed as a best practice, is the same artifact Liu anchors his shared-memory vault on: durable, written-down context the agent reads across sessions. Map, deep dive, field report — one idea at three altitudes.

Cross-reference · No. 05 — Liu, codex-getting-the-most.html · No. 04 — Hayduk, codex-goals.html. See § Best practices (04 & 05) for the seams.

How OpenAIuses Codex

Code understanding

Refactoring & migrations

Performance optimization

Improving test coverage

Increasing velocity

Staying in flow

Exploration & ideation

Best practices

Start with Ask Mode

Iterate on the environment

Prompt like a GitHub Issue

Queue as a backlog

Use AGENTS.md for context

Leverage Best-of-N

Still a research preview —already real impact.

The third panel of the Codex triptych.

How OpenAI
uses Codex

Still a research preview —
already real impact.