OpenAI Codex Review

by Darko Medin

Dec 15, 2025

What you should know immediately - OpenAI codex will give you most resources of all Coding Agents with a decent, thinking abilities, accuracy and automation. Period. Now to the more details…

GENERAL Intro

Codex is OpenAI’s Agentic coding product: a software-engineering agent that can work across your repo, edit files, run commands/tests, and even propose pull requests, with tasks executed in isolated environments (cloud sandboxes) or locally via the CLI/IDE tools. Well you can basically select the folder as you work directory and the Agent will focus on that directory for most of the working time. I think that’s very important for Security of the Environment the Codex Agent is operating in. Integration with Github is also very good. You can ship apps directly to Github and also enable your Codex Agent to actually modify your repose and write documentation.

How to use it best? Very simple, you use it with Microsoft Visual Studio Code as extension.

https://code.visualstudio.com/

On the left side you have the Extension bar and once you click on it you can easily chat with your agent. (Note the right side is Vs Copilot).Basically the Agent will understand in natural language what you want to build in the folder you selected, however one needs to be very specific as Agent will not assume.

Usually it is optimized for building frontend and backend, however what it needs as additional input is what you need as environment, will you be using docker, kafka specific databases, what will you use for frontend like React or javascript, css etc. Its very important to give this context.

Frontend dev is quite optimized in Codex and if you are working in this field you may find it good, but only if you provide it with detailed information and schematics. LLM based agents still struggle to understand what we really want and Codex is no different. If you’re using Codex via a CLI / agent in a dev container: it can usually run Playwright/Selenium and programmatically capture screenshots (e.g., page.screenshot(...)) as part of a task. This is very important for frontend developers so that you dont have to make screenshots manually and upload to Codex all the time.

Automation, Automation, that’s where the Coding agents compete…. This is a good capability of Codex and so far i’ve seen it work as well only in Antigravity by Google. Its good at Python, so if you build anything backend make sure its focused on python combined with other languages where python is the overhead orchestrator. You may try Rust or C++, but best combined with Python. As of the backend development you have to give it very clear instructions and sometimes repeat if you want it to work well. The main problem is usually that if you have 10+ backend files it may miss some portions of them, so best to give it clear instructions to check all, read all. The Agent usually needs some help from us when it comes to backend however its a

PROS :

One of the biggest pros are that you get higher accuracy, less errors and better automated testing abilities of Codex. First thing you will notice is that now its more automated than most other Agentic coders, which means Agent want ask you every now and then to ‘Accept’ running things. This is especially good if you opt for ‘Full Agent’ . My strong recommendation for this.

If you use the options they give you, which is ChatGPT 5.1, (i assume 5.2 will be soon too), Codex-MAX in the paid version your model will usually see better performance, because, yes one of the largest companies in the World OpenAI is providing top tier models for Codex even tough as with most coding agents you might need 3-5 iterations of human interventions to fix errors.

What do i like extremelly! Its very good at running docker on your PC

You basically containerize whatever you build and so far in a few prompts 2-3 usually it will work. Not perfect but good enough! So it can actually automate whatever you would be doing with docker, same would apply to other CLI stuff. But be careful, you have to monitor and check all the time…

Codex is also easily intregrateble with GitHub so you can push, pull requests, make repose, git download, install. According to my knowledge only two platforms with such strong capabilities are Codex and Emergent, however Codex as the edge as emergent is superficial on this matter.

CONS : Cons are the usual ones we have with all Agentic coding systems. Heavy reliance on LLMs will mean a bunch of errors, so 3-5 iterations on average took me to build functional products without errors. Memory management is a bit loose so some of the memories, history of chats may spill over between sessions unless you delete them. It needs slight improvement on the way it reasons about a larger number of files. So i think one pass of all files system would help. Finally i think in the next iterations they should make a multi agent system. Just as the user has options to chose different models, there should be option to chose between a larger number of Agents specialized for different tasks. Right now the user has option to chose between different levels of autonomy for an Agent, this should be expanded more…

EXAMPLE BUILT

So i used the Codex Agent to create another Dual Agentic Webapp where One Agent proposes code the other Critiques it and they figure out togheader the next step. For fun i called the Neon Proposes and Ghost Critic.

https://github.com/DarkoMedin/DualAgentP

It took about 2-3 hours of communication and my instructions to Codex to create thee app, fix bugs and turn in into something functions. PS you can find the app here and play around with it. (Note you add you LLM API for it to work).

What the Agentic app can do? Basically you ask them to code something, they will talk, propose, critique look at terminal outputs improve code in iterations and give you final code. Interesting that i was able to build this in a bit more than 2 hours. Speaks by itself on Codex id say. I do have to say there was a substantial number of errors i had to point an Agent to fix, so it wasn’t an easy job, far from full automatism, id say some semi-automatism with some inaccuracies but Codex is definitely better than most of it competition in this moment.

FINAL VERDICT :

Codex is one of top 3 Software developing and coding IDEs right now and is number 1 choice for my personal coding. I think that tells enough… Latest upgrades on automation and models like Codex-MAX give it edge over most competition. Resources they give you for a 20USD subscription are again just outcompeting others. Pain points are the usual ones we see with AI Coder IDEs, struggling with session, memory management, understanding complex file structures and a number of error feedback iterations before you actually have your AI coded Web app or other product.

So how would i rank Codex now? Among top 3 Coding agents on my list. Definitely miles ahead of Cursor and Kiro (i don’t know about Antigravity as i am yet to fully test it. Why is it better? Well you get pretty much similar accuracy of development and reasoning as with Cursor, just for the same money 20 USD subscription you actually get to code as much as you want per month, while Cursor limits you after a few days of coding. So yes Codex is quite good now and i am actively using it in my stack. Compared to Kiro its just more automated currently.

In 2-3 hours you can build production grade Agentic apps with Codex Agent with some practice and experience. This timeline will probably go down in the future…

My name is Darko Medin, hope this review was useful to you and let me know what’s your opinion in the comments or otherwise…

Darko’s Substack

Discussion about this post

Ready for more?