I've been using Sonnet whenever I run into the Codex limit, and the difference is stark. Twice yesterday I had to get Codex to fix something Sonnet just got entirely wrong.
I registered a domain a year ago (pine.town) and it came up for renewal, so I figured that, instead of deleting it, I'd build something on it, and came up with the idea of an infinite collaborative pixel canvas with a "cozy town" vibe. I have ZERO experience with frontend, yet Codex just built me the entire damn thing over two days of coding:
It's the first model I can work with and be reasonably assured that the code won't go off the rails. I keep adding and adding code, and it hasn't become a mess of spaghetti yet. That having been said, I did catch Codex writing some backend code that could have been a few lines simpler, so I'm sure it's not as good as me at the stuff I know.
Then again, I wouldn't even have started this without Codex, so here we are.
I wonder how much of it comes down to how models "train us" to work in ways they are most effective.
In the web interface press the "+" button next to the repo it is working on. Not obvious at all though!
After all these years, maybe even decades, of seeing your blog posts and projects on here, surely you must have had more experience with frontend than ZERO since you first appeared here? :)
I'm at the point where I have so much built up around claude code workflows that claude feels very good. But when I don't use them, I find that I immensely prefer gpt-5 (and for harder, design influencing questions, grok-4 heavy which is not available behind an API)
It's noticeable when you setup some semi-fixed workflow against some model, and when you try to switch to a different family of models, the performance and accuracy notably change.
It is however slow, and more expensive. You can either pay the $20 and get maybe 2 days of work out of it, or $200 for "Pro." But there's nothing inbetween like the $100 USD Claude Code tier.
Context window is too small though, and it sometimes has problems with compacting. But I was having that with Sonnet 4.5 as well.
They're still lacking slash commands, sub agents etc (since they don't own their own model), but they do integrate language servers, which seems to be handy on larger codebases.
Crush + GLM-4.6 is one of the three I use regularly along with Claude and Codex
Tool use codex is trash compared to sonnet. So still not a one stop shop.
It's really easy to steer both Claude Code and Codex against that though, plop "Don't do any other changes than the ones requested" in the system prompt/AGENTS.md and they mostly do good with that.
I've tried the same with Gemini CLI and Gemini seems to mostly ignore the overall guidelines you setup for it, not sure why it's so much worse at that.
Sonnet is much less successful.
So lately I'll start with Sonnet for everything but the most complex tasks and then switch to Codex when needed.
I do look at the backend code it writes, and it seems moderately sane. Sometimes it overcomplicates things, which makes me think that there are a few dragons in the frontend (I haven't looked), but by and large it's been ok.
Oh.
Not good enough for you?
If I skip 5 Pro but still have a large task, I have Codex write a spec file to use as a task list and to review for completeness as it works.
This is how you can use Codex without a plan mode.
I have similar workflow as parent, GPT 5 Pro for aiding with specifications and deep troubleshooting, rely on Codex to ground it in my actual code and project, and to execute the changes.
Yes Codex is still very early. We use it because it's the best model. The client experience will only get better from here. I noticed they onboarded a bunch of devs to the Codex project in GitHub around the time of 5's release.
That hasn't been my experience at all, neither first with the Codex UI since it was available to Pro users, nor since the CLI was available and I first started using that. GPT 5 Pro will (can, to be precise) only read what you give it, Codex goes out searching for what it needs, almost always.
What my quote meant is that once you have the context Codex needs to do its work, if you give it to it, it’ll start the work right away without going and reading all those files again, which can help minimize context use within a Codex session (by having 5 Pro or just another Codex read in a lot of context to identify what is relevant for Codex instead of having Codex waste precious context headroom on discovery in a session that is dedicated to doing the work).
On the web, press the "+" button next to the repo
I'll write a post when I finish Pine Town, but I don't know what I could say about Codex in it. I think a big issue is that I don't know what others don't know, as the way I use LLMs (obviously) feels natural to me. Here are some tips that you may or may not already know:
* Reset the context as often as you can. LLMs like short contexts, so when you reach a point where the information has converged into something (e.g. the LLM has done a lot of work and you want it to change one of the details), reset the context, summarize what you want, and continue.
* Give the LLM small tasks that are logically coherent. Don't give it large, sprawling, open-ended tasks, but also don't give it chunks so tiny that it doesn't know what they're for.
* Explain the problem in detail, and don't dictate a solution. The LLM, like a person, needs to know why it's doing what it's doing, and maybe it can recommend better solutions.
* Ask it to challenge you. If you try to shoehorn the LLM too much, it might go off the rails trying to satisfy an impossible request. I've had a few times where it did crazy things because I didn't realize the thing I was asking for wasn't actually possible with the way the project was set up.
That's what I can think of off the top of my head, but maybe I'll write a general "how to work with LLMs" post. I don't think there's anything specifically different about Codex, and there must be a million such posts already, so I don't know if anyone will find value in the above... For me, it Just Worked™, but maybe that's just because I stumbled upon some specific technique that most people don't use.
So I was going to write a commiseration and a screed about what a colossal UI failure this is, that you can so easily lose such work. But FWIW, before posting I searched to see if there are any extensions to address this. There are several for Chrome, but on Firefox I ended up trying "Textarea Cache", and sure enough if you close the page, and reopen it later, you can click the icon to recover your words.
I like that Codex commits using your identity as if it was your changes. And I like that you can interact with it directly from the PR as if it was a team member.
Like this icon tool by @simonw: https://tools.simonwillison.net/icon-editor
Or I had an idea for a learning tool for my kids:
1) take a picture of the word list from the study book, give it with a prompt to an LLM, which produces a JSON Anki-style card set from the words
2) a simple web UI for a basic spaced repetition model that can ingest the JSON generated in step 1
All this went from idea to MVP while we were watching the first Downton Abbey movie.
After the movie was over, I could come to my desktop, open Claude Code with the previous chat and "teleport" it to my local machine to test it.
https://cookbook.openai.com/examples/gpt-5-codex_prompting_g...
The "Codex" model requires different promoting for the best results. You may also find, depending on your task, that the standard non-codex model works better.
Remote work has been a thing for more than a decade now. I always have the feeling that most of the people commenting on the web are new to the industry.
More than 10 years ago we had the same setup. We will say "deploy app_name" in the chat and it will just do that. With a VPN we worked like if we were in the office from anywhere in the world (but most people, to be realistic, just worked from home).
To need a web-based IDE seems a step backwards. You are already connected to the internet, any IDE will have access to all the needed services thru an internet connection.
Our world is becoming more and more fragile as corporations look to concentrate all services in just one place. I do not see a good ending to all this.
creating container -> cloning repo -> making change -> test -> send PR
is too slow of a loop for me to do anything much useful. It's only good for trivial "one-shot" stuff.
Codex is when you want to one-shot something and have got the specs ready. It just keeps puttering away not giving much feedback (Especially the VS Code version is real quiet...)
Claude is more like a pair-programmer, you kinda need to watch what it does most of the time and it will tell you what it's doing (by default) and doesn't mind if you hit Esc and tell it to go another way.
Claude will Get Stuff Done.
Codex will find the subtle bugs and edge cases Claude left in its wake =)
1. claude code CLI, generally works, great tool use
2. codex on the web, feels REALLY smart, but can’t use tools
3. codex CLI, still smarter than claude but less situational awareness
4. codex via iphone app, buggier than the web app
5. claude code on the web, worst of all worlds
Gemini is really good at convincing they know what you're talking about. Sadly it hallucinates, and it does this confidently. You end up just thinking "well they confirmed x is greppable in y" but in reality they never used grep
Only a closed set of languages are supported and the hook for startup installation of additional software seems to be not fully functioning at the moment.
Is the 1.5 years that I have left worth it? (I already have an Associate's Degree).
1. The degree is useful. Having a Bachelor's opens up a lot of career paths because it shows that you committed to the Data Analytics program for four years. It also helps HR check off the "has a bachelor's" item on their list.
2. What you learn is useful. At the end of the day, you will be responsible for the code that the AI produces. How will you understand, explain, and justify your code to your colleagues and managers? "SQL, Python, JavaScript" and "theoretical Data Analytics knowledge" are both tools that will help you.
3. So far, senior engineers tend to have the most productivity boosts with AI. These engineers became "senior" before AI coding agents became mainstream, which means they know how to program. So based on this pattern, if you know how to program, then you will benefit more from AI.
Maybe you have other factors you are considering (e.g. money). My response is primarily based on the "existence of AI coding agents in the industry" factor.
I think what you say makes sense. There's times when you hear advice and you just know it's true and on point. And that is exactly what I saw in your words of advice.
I'm going to stick it out and just finish. Aside from career, it is also helping me with random side interests that I have like making my house smart, setting up media servers, creating my own Raspberry Pi surveillance system, automating work tasks. So like you said, the things I'm learning are useful in and of themselves.
Thanks a bunch, friend! You made a real difference!
I love the feature set of Claude Code and my entire workflow has been fine tuned around it, but i had to to codex this month. Hopefully the Claude Code team spends some time to slow down and focus on bugs.
Everything Anthropic does from an engineering standpoint is bad, they're a decent research lab and that's it.
This may be true, but then I wonder why it is still the case that no other agentic coding tool comes close to Claude Code.
Take Gemini Pro: excellent model let down by a horrible Gemini CLI. Why are the major AI companies not investing heavily in tooling? So far all the efforts I've seen from them are laughable. Every few weeks there is an announcement of a new tool, I go to try it, and soon drop it.
It seems to me that the current models are as good as they are goingto be for a long time, and a lot of the value to be had from LLMs going forward lies in the tooling
Claude is a very good model for "vibe coding" and content creation. It's got a highly collapsed distribution that causes it to produce good output with poor prompts. The problem is that collapsed distribution means it also tends to disobey more detailed prompts, and it also has a hard time with stuff that's slightly off manifold. Think of it like the car that test drives great but has no end of problems under atypical circumstances. It's also a naturally very agentic, autonomous model, so it does well in low information scenarios where it has to discover task details.
They're running an offer for 9€/quarter for the model, and the results are promising.
I'd like to build an integration with Whisper Memos (https://whispermemos.com/)
Then I'd be able to dictate a note on my Apple Watch such as:
> Go into repository X and look at the screen Y, and fix bug Z.
That'd be so cool.
That specific part doesn't have anything to do with Claude Web though, does it? When I use Codex and Claude they repeatedly look up stuff in the local git history when working on things I've mentioned I've worked on a branch or similar. As long as you make any sort of mention that you've used git, directly or indirectly, they'll go looking for it, is my feeling.
- Time to start your container (or past project) is ~1 sec to 1 min. - Fully supported NixOS container with isolated, cloned agent layer. Most tools available locally to cut download times and ai web access risk. - Github connections are persistent. Agents do a reasonable job with clean local commits. - Very fast dev loops (plan/build/test/architect/fix/test/document/git commit / push to user layer) with adjustable user involvement. - Phone app is fully featured... I've never built apps on roadtrips before replit. - Uses claude code currently (has used chatgpt in the past).
Tips: - Consider tig to help manage git from cli before you push to github. - Gitlab can be connected but is clumsy with occasional server state refreshes. - Startups that haven't committed to an IDE yet and expect compatibility with NixOS would have strong reason to consider this. It should save them the need to build their own OS-local AI code through early builds.
Also, future IDEs will have prompts and interfaces to better manage the LLM.
Interested to give this a go. But I would also need it to be able to run docker compose and playwright, to keep things on the rails.
It is a different thing, in a sense, because you can install command line tools that far surpass the Claude client’s tooling. Pandoc, curl, imagemagick, etc. Without these tools, CC will often write ad hoc scripts. The tools you have installed (provided you tell it) will always be better and more efficient.
I think IDEs we're gonna see Vims, Emacs, Jetbrains, Vscode. For now CC web seems to be the sublime text of that world, and Terragon/Sculptor are yet to differentiate enough like a jetbrains
We try to be the jetbrains of this, which is not a smart move for a bigger co like Anthropic to take
Codex handles this much better. You choose when to make a PR and you can also just copy a .patch or git apply to your clipboard.
EDIT. They might have fixed this. Just testing. Does the mobile android app have Claude Code support yet or is it still annoyingly an iOS only thing?
EDIT2. It creates a public branch but not a PR. I'd still prefer that was a manual step.