Less a manifesto than a field report: where a coding agent actually earns its keep across a large engineering org, told through the use cases teams reach for and the habits they’ve built around it.
Codex is used daily across numerous technical teams at OpenAI — accelerating a range of engineering tasks, from understanding complex systems and refactoring large codebases to shipping new features and resolving incidents under tight deadlines.
Drawing on interviews with engineers and internal usage data, this report compiles the use cases and best practices that show how Codex helps teams move faster, improve quality, and manage complexity at scale.
Codex helps teams get up to speed quickly in unfamiliar parts of the codebase — locating the core logic of a feature, mapping relationships between services or modules, and tracing data flow through a system.
It surfaces architecture patterns and missing documentation that would otherwise take significant manual effort to reconstruct. During incident response, it lets engineers ramp into new areas fast by surfacing how components interact and how failure states propagate.
When I fix a bug, I use Ask mode to see where else in the codebase the same issue might appear.
When I’m on-call, I paste the stack trace and ask Codex where the auth flow lives. It jumps straight to the right files so I can triage fast.
Codex answers my ‘Where would I do this?’ repo questions across Terraform and Python way faster than grep.
Codex is commonly used for changes that span multiple files or packages — updating an API, changing how a pattern is implemented, migrating to a new dependency — applying changes consistently across dozens of files.
It’s especially useful when the update requires awareness of structure and dependencies that a regex or find-and-replace would miss, and for cleanup: breaking up oversized modules, replacing old patterns with modern ones, preparing code for testability.
Codex swapped every legacy getUserById() for our new service pattern and opened the PR. It did in minutes what would’ve taken hours.To clear launch blockers, I have Codex scan for every instance of the old pattern, summarize the impact in Markdown, then open PRs with the fixes.
During tuning and reliability work, engineers prompt Codex to analyze slow or memory-intensive paths — inefficient loops, redundant operations, costly queries — and suggest optimized alternatives, often with meaningful efficiency and reliability gains.
It also supports code health by flagging risky or deprecated patterns still in active use, helping reduce long-term tech debt and head off regressions.
I use Codex to scan for repeated expensive DB calls. It’s great at flagging hot paths and drafting batched queries I can later tune.
Codex is great for spotting performance issues quickly — I save 30 minutes of work by spending 5 minutes on a prompt.
Codex helps engineers write tests faster, especially where coverage is thin or missing. On a fix or refactor, it suggests tests for edge cases and likely failure paths; for new code, it generates unit or integration tests from the signature and surrounding logic.
It’s particularly good at boundary conditions — empty inputs, max length, unusual-but-valid states — the cases most often missed in a first pass.
I point Codex at low-coverage modules overnight and wake up to runnable unit-test PRs.
When switching mono-repo branches is painful, I have Codex write the tests and kick off CI while I keep working on my branch.
Codex accelerates both ends of the development cycle. At kickoff, engineers use it to scaffold boilerplate — folders, modules, API stubs — to get runnable code up without hand-wiring every piece.
Near release, it handles the small-but-essential work: triaging bugs, filling last-mile gaps, generating rollout scripts, telemetry hooks, and config files. It also turns product feedback into starter code — paste a request or spec, get a rough draft to refine.
I was in meetings all day and still merged 4 PRs because Codex was working in the background.
Codex helped ship 3–4 low-priority fixes perfectly that would’ve languished in the backlog, which was super empowering.
Codex keeps engineers productive when schedules are fragmented and full of interruptions. It captures unfinished work, turns notes into working prototypes, and spins off exploratory tasks to revisit later.
That makes it easier to pause and resume without losing context — especially while on call or stacked with meetings.
If I spot a drive-by fix, I fire a Codex task instead of swapping branches and review its PR when I’m free.
I routinely forward Slack threads, Datadog traces, issues and more to Codex so I can stay focused on high-priority work.
Codex is useful for open-ended work: finding alternative solutions, validating design decisions, exploring unfamiliar patterns, and pressure-testing assumptions. This surfaces tradeoffs, expands design options, and sharpens implementation choices.
It’s also used to find related bugs — given a known issue or deprecated method, Codex identifies similar patterns elsewhere, making regressions and cleanup easier to catch.
Codex helps me solve the cold-start problem — I paste a spec and docs and it scaffolds code or shows me what I forgot.
After I fix a bug I ask Codex where similar bugs might lurk, then spin follow-up tasks.
I was in meetings all day and still merged 4 PRs because Codex was working in the background.— Product Engineer · ChatGPT Enterprise · the velocity, in one line
Codex works best when it’s given structure, context, and room to iterate. These are the habits OpenAI teams cultivate to get consistent value out of it day to day.
For large changes, prompt for an implementation plan in Ask mode first, then feed that plan into Code Mode. The two-step flow keeps Codex grounded and avoids errors. It works best on well-scoped tasks — roughly an hour of work, or a few hundred lines. Expect that ceiling to rise as models improve.
Setting a startup script, environment variables, and internet access significantly reduces Codex’s error rate. As you run tasks, fold recurring build errors back into the environment config. It takes a few iterations, but the long-run efficiency gains are large.
Codex responds better when prompts mirror how you’d describe a change in a PR or issue: include file paths, component names, diffs, and doc snippets when relevant. Patterns like “implement this the same way it’s done in [module X]” improve results.
Fire off tasks to capture tangential ideas, partial work, or incidental fixes. There’s no pressure to produce a full PR in one go — the task queue works as a staging area you return to when you’re back in focus.
Maintain an AGENTS.md so Codex operates effectively across prompts. These files carry naming conventions, business logic, known quirks, and dependencies Codex can’t infer from the code alone.
Best-of-N generates multiple responses for a single task at once — explore several solutions and pick the strongest. For harder tasks, review iterations and combine parts of different responses into one better result.
Codex is helping teams move faster, write better code, and take on work that would otherwise never have been prioritized. As the models improve and Codex integrates more deeply into everyday workflows, the expectation is more powerful ways to build software — and more learnings shared along the way.
Read together, the three Codex readings in this library answer different questions.
Liu’s Getting the Most Out of Codex (codex-getting-the-most.html, No. 05)
is the capability map — what the system can reach when it leaves the repo.
Hayduk’s goal-mode piece (codex-goals.html, No. 04) is the deep dive
into one capability. This report is the field evidence: which of those capabilities
a large org actually reaches for, in the words of the engineers using them.
The seams are worth noticing. The task queue here is Liu’s queuing seen
from the floor — “fire off a task, review the PR when I’m free.” And AGENTS.md, listed
as a best practice, is the same artifact Liu anchors his shared-memory vault on: durable,
written-down context the agent reads across sessions. Map, deep dive, field report — one idea at three altitudes.
codex-getting-the-most.html ·
No. 04 — Hayduk, codex-goals.html. See § Best practices (04 & 05) for the seams.