FilterHN

friggeri

2 days ago

[-]

Looking at the prompts op has shared, I'd recommend more aggressively managing/trimming the context. In general you don't give the agent a new task without /clearing the context before. This will enable the agent to be more focused on the new task, and decrease its bias (if eg. reviewing changes it has made previously).

The overall approach I now have for medium sized task is roughly:

- Ask the agent to research a particular area of the codebase that is relevant to the task at hand, listing all relevant/important files, functions, and putting all of this in a "research.md" markdown file.

- Clear the context window

- Ask the agent to put together a project plan, informed by the previously generated markdown file. Store that project plan in a new "project.md" markdown file. Depending on complexity I'll generally do multiple revs of this.

- Clear the context window

- Ask the agent to create a step by step implementation plan, leveraging the previously generated research & project files, put that in a plan.md file.

- Clear the context window

- While there are unfinished steps in plan.md:

-- While the current step needs more work

--- Ask the agent to work on the current step

--- Clear the context window

--- Ask the agent to review the changes

--- Clear the context window

-- Ask the agent to update the plan with their changes and make a commit

-- Clear the context window

I also recommend to have specialized sub agents for each of those phases (research, architecture, planning, implementation, review). Less so in terms of telling the agent what to do, but as a way to add guardrails and structure to the way they synthesize/serialize back to markdown.

devingould

2 days ago

[-]

I pretty much never clear my context window unless I'm switching to entirely different work, seems to work fine with copilot summarizing the convo every once in a while. I'm probably at 95% code written by an llm.

I actually think it works better that way, the agent doesn't have to spend as much time rereading code it had previously just read. I do have several "agents" like you mention, but I just use them one by one in the same chat so they share context. They all write to markdown in case I do want to start fresh if things do go the wrong direction, but that doesn't happen very often.

loudmax

2 days ago

[-]

I wouldn't take it for granted that Claude isn't re-reading your entire context each time it runs.

When you run llama.cpp on your home computer, it holds onto the key-value cache from previous runs in memory. Presumably Claude does something analogous, though on a much larger scale. Maybe Claude holds onto that key-value cache indefinitely, but my naive expectation would be that it only holds onto it for however long it expects you to keep the context going. If you walk away from your computer and resume the context the next day, I'd expect Claude to re-read your entire context all over again.

At best, you're getting some performance benefit keeping this context going, but you are subjecting yourself to context rot.

Someone familiar with running Claude or industrial-strength SOTA models might have more insight.

kookamamie

2 days ago

[-]

CC absolutely does not read the context again during each run. For example, if you ask it to do something, then revert its changes, it will think the changes are still there leading to bad times.

https://www.anthropic.com/news/prompt-caching

Terretta

2 hours ago

[-]

It wouldn't re-read the context, it would cache tokens thus far which is like photographically remembering the context instead of re-reading it, until you see it "compress" context when it gives itself a prompt to recap so far:

catlifeonmars

1 day ago

[-]

When you say “revert its changes” do you mean undo the changes outside of CC? Does CC watch the filesystem?

kookamamie

1 day ago

[-]

Yes, reverting outside. This can happen often when one is not happy with CC's output - Esc + revert.

dboreham

1 day ago

[-]

You can tell it that you manually reverted the changes.

That said, the fact that we're all curating these random bits of "llm whisperer" lore is...concerning. The product is at the same time amazingly good and terribly bad.

kookamamie

1 day ago

[-]

I know. Typically I'd let CC know with "I reverted these changes."

catlifeonmars

1 day ago

[-]

This article on OpenAI prompt caching was interesting https://platform.openai.com/docs/guides/prompt-caching

As someone who definitely doesn’t know what they’re talking about, I’m going to guess that some analogous optimizations might apply to Claude.

Something something… TPU slice cache locality… guestures vaguely

https://www.anthropic.com/news/prompt-caching

Terretta

2 hours ago

[-]

Not vague:

madduci

1 day ago

[-]

I have tested today a mix of cleaning often the context with long contexes and Copilot with Claude ended producing good visual results, but the generated CSS was extremely messy.

lukaslalinsky

2 days ago

[-]

Even better approach, in my experience is to ask CC to do research, then plan work, then let it implement step 1, then double escape, move back to the plan, tell it that step 1 was done and continue with step 2.

2 days ago

[-]

This is a really interesting approach as well, and one I'll need to try! Previously, I've almost never moved back to the plan using double escape unless things go wrong. This is a clever way to better use that functionality. Thanks for sharing!

R0m41nJosh

1 day ago

[-]

I have been reluctant to use AI as a coding assistant though I have installed claude code and bought a bunch of credits. When I see comments like this I genuinely asking what's the point. Are you sure that going through all of these manipulation instead of directly editing the source code makes you more productive? In which way?

Not trolling, true question.

lukaslalinsky

1 day ago

[-]

Years ago, I was joking with my colleagues that I'm living two weeks ahead, writing the present day code is a chore, thinking about the next problems is more important, so that when the time comes to implement them, I know how. I don't have much time to code these days, but I still have the ability to think. Instead of doing the chore myself, I now delegate it to Claude Code. I still do coding occasionally, usually when it's something hard that I know AI will mess up, but in those instances, I enjoy it.

yomismoaqui

1 day ago

[-]

Time is the answer here, if this dance is 2 hours and implementing it by hand is 8 hours you have won.

Also while Claude Code is crunching floats you can do other things (maybe direct another agent instance)

2 days ago

[-]

You just convinced me that Llms are a pay-to-play management sim.

bubblyworld

2 days ago

[-]

Heh, there are at least as many different ways to use LLMs as there are pithy comments disparaging people who do.

2 days ago

[-]

I’m not disparaging it, just actualizing it and sharing that thought. If you don’t understand that most modern “tools” and “services” are gamified, then yes I suppose I seem like a huge jerk.

The author literally talks about managing a team of multiple agents and Llm services requiring purchase of “tokens” is similar to popping a token into an arcade machine.

dgfitz

2 days ago

[-]

I read a quote on here somewhere:

"Hacker culture never took root in the AI gold rush because the LLM 'coders' saw themselves not as hackers and explorers, but as temporarily understaffed middle-managers"

1 day ago

[-]

Interesting and agreeable.

Also hacking really doesn’t have anything to do with generating poorly structured documents that compile into some sort of visual mess that needs fixing. Hacking is the analysis and circumvention of systems. Sometimes when hacking we piece together some shitty code to accomplish a circumvention task, but rarely is the code representative of the entire hack. Llms just make steps of a hack quicker to complete. At a steep cost.

hiAndrewQuinn

2 days ago

[-]

Electricity prices are also token-based, in a sense, yet most people broadly agree this is the best way to price them.

2 days ago

[-]

I’m not following your point. What is the disagreement about Llm pricing?

baq

1 day ago

[-]

Scratch the sim and you’re golden. The difference is it’s agent management, it isn’t people management.

4 hours ago

[-]

Agents simulating people’s actions.

giancarlostoro

2 days ago

[-]

> Looking at the prompts op has shared, I'd recommend more aggressively managing/trimming the context. In general you don't give the agent a new task without /clearing the context before. This will enable the agent to be more focused on the new task, and decrease its bias (if eg. reviewing changes it has made previously).

My workflow for any IDE, including Visual Studio 2022 w/ CoPilot, JetBrains AI, and now Zed w/ Claude Code baked in is to start a new convo altogether when I'm doing something different, or changing up my initial instructions. It works way better. People are used to keeping a window until the model loses its mind on apps like ChatGPT, but for code, the context Window gets packed a lot sooner (remember the tools are sending some code over too), so you need to start over or it starts getting confused much sooner.

1 day ago

[-]

I've been meaning to try Zed but haven't gotten into it yet; it felt hard to justify switching IDEs when I just got into a working flow with VS Code + Claude Code CLI. How are you finding it? I'm assuming positive if that's your core IDE now but would love to hear more about the experience you've had so far.

lukaslalinsky

1 day ago

[-]

If you are Claude Code user, you will likely not enjoy the version integrated into Zed. Many things are missing, for example, no slash commands. I use Zed, but still run Claude Code in the terminal. As an editor, Zed is excellent, especially as a Vim replacement.

16 hours ago

[-]

Oh interesting, that’s good to know. Thank you. I might try that combination - I like using the Claude Code CLI so hopefully less of a painful transition.

giancarlostoro

19 hours ago

[-]

I guess I am a weirdo because I never used Claude Code until Zed added it. At least Zed has a built in terminal emulator to boot.

2 days ago

[-]

OP here, this is great advice. Thanks for sharing. Clearing context more often between tasks is something I've started to do more recently, although definitely still a WIP to remember to do so. I haven't had a lot of success with the .md files leading to better results yet, but have only experimented with them occasionally. Could be a prompting issue though, and I like the structure you suggested. Looking forward to trying!

I didn't mention it in the blog post but actually experimented a bit with using Claude Code to create specialized agents such as an expert-in-Figma-and-frontend "Design Engineer", but in general found the results worse than just using Claude Code as-is. This also could be a prompting issue though and it was my first attempt at creating my own agents, so likely a lot of room to learn and improve.

antihero

1 day ago

[-]

This sounds like more effort than just writing the code.

dotancohen

1 day ago

[-]

Usually, managing a development team is more work than just writing the code oneself. However, managing a development team (even if that team consists of a single LLM and yourself) means that more work can be done in a shorter period of time. It also provides much better structure for ensuring that tests are written, and that documentation is written if that is important. And in my experience, though not everybody's experience I understand, it helps ensure a clean, useful git history.

Jweb_Guru

1 day ago

[-]

It is.

enraged_camel

2 days ago

[-]

This is overkill. I know because I'm on the opposite end of the spectrum: each of my chat sessions goes on for days. The main reason I start over is because Cursor slows down and starts to stutter after a while, which gets annoying.

1 day ago

[-]

Claude auto-condenses context, which is both good/bad. Good in that it doesn't usually get super slow, bad in that sometimes it does this in the middle of a todo and then ends up (I suspect) producing something less on-task as a result.

philipp-gayret

2 days ago

[-]

Since reading I can --continue I do the same. If I find it's missing something after compressing context I'll just make it re-read a plan or CLAUDE.md

16 hours ago

[-]

That’s a solid approach and one I hadn’t thought of myself. Thank you!

fizx

1 day ago

[-]

Cursor when not in "MAX" mode does its own silent context pruning in the background.

jimbo808

2 days ago

[-]

This seems like a bit of overkill for most tasks, from my experience.

dingnuts

2 days ago

[-]

it just seems like a lot of work when you could just write the code yourself, just a lot less typing to go ahead and make the edits you want instead of trying to guide the autocorrect to eventually predict what you want from guidelines you also have to generate to save time

like I'm sorry but when I see how much work the advocates are putting into their prompts the METR paper comes to mind.. you're doing more work than coding the "old fashioned way"

lo5

1 day ago

[-]

it depends on the codebase.

if there's adequate test coverage, and the tests emit informative failures, coding agents can be used as constraint-solvers to iterate and make changes, provided you stage your prompts properly, much like staging PRs.

claude code is really good at this.

jngiam1

2 days ago

[-]

I also ask the agent to keep track of what we're working on in a another md file which it save/loads between clears.

raducu

2 days ago

[-]

In 2025, does it make any difference to tel the LLM "you're an expert/experienced engineer?"

remich

1 day ago

[-]

The question isn't whether it makes a difference, the question is whether the model you're working with / the platform you're working with it on already does that. All of the major commercial models have their own system prompts that are quite detailed, and then the interfaces for using the models typically also have their own system prompts (Cursor, Claude Code, Codex, Warp, etc).

It's highly likely that if you're working with one of the commercial models that has been tuned for code tasks, in one of the commercial platforms that is marketed to SWEs, that instructions similar to the effect of "you're an expert/experienced engineer" will already be part of the context window.

antihero

1 day ago

[-]

I think it makes more of writing prompts and leading in a way only an experienced engineer could.

adastra22

2 days ago

[-]

Yes. The fundamental reason why that works hasn’t changed.

hirako2000

2 days ago

[-]

How has it ever worked. I have thousands of threads with various LLMs, none have that role play cue, yet the responses always sound authoritative and similar to what one would find in literature written by experts in the field.

What does work is to provide clues for the agent to impersonate a clueless idiot on a subject, or a bad writer. It will at least sound like it in the responses.

Those models have been heavily trained with RLHF, if anything today's LLMs are even more likely to throw authoritative predictions, if not in accuracy, at least in tone.

https://github.com/0xeb/TheBigPromptLibrary/tree/main/System...

Fuzzwah

1 day ago

[-]

Every LLM has role play cues in their system prompts:

pwython

1 day ago

[-]

I also don't tell CC to think like expert engineer, but I do tell it to think like a marketer when it's helping me build out things like landing pages that should be optimized for conversions, not beauty. It'll throw in some good ideas I may miss. Also when I'm hesitant to give something complex to CC, I tell that silly SOB to ultrathink.

adastra22

1 day ago

[-]

Every LLM you have ever used has this role play baked into its system prompt.

hirako2000

2 days ago

[-]

It never made a difference.

shafyy

2 days ago

[-]

Dude, why not just do it yourself if you have to micromanage the LLM this hardcore?

1oooqooq

1 day ago

[-]

and it will completely ignoring the instructions because user input cannot afect it, but it will waste even more context space to fool you that it did.

opto

1 day ago

[-]

> I wasted several hours on occasions where Claude would make changes to completely unrelated parts of the application instead of addressing my actual request.

Every time I read about people using AI I come away with one question. What if they spent hours with a pen and paper and brainstormed about their idea, and then turned it into an actual plan, and then did the plan? At the very least you wouldn't waste hours of your life and instead enjoy using your own powers of thought.

1 day ago

[-]

> What if they spent hours with a pen and paper and brainstormed about their idea, and then turned it into an actual plan, and then did the plan? At the very least you wouldn't waste hours of your life and instead enjoy using your own powers of thought.

OP here - I am a bit confused by this response. What are you trying to say or suggest here?

It's not like I didn't have a plan when making changes; I did, and when things went wrong, I tried to debug.

That said, if what you mean by having a plan (which again, I might not be understanding!) is write myself a product spec and then go build the site by learning to code or using a no/low code tool, I think that would have been arguably far less efficient and achieved a less ideal outcome.

In this case, I had Figma designs (from our product designer) that I wanted to implement, but I don't have the programming experience or knowledge of Remix as a framework to have been able to "just do it" on my own in a reasonable amount of time without pairing with Claude.

So while I had some frustrating hours of debugging, I still think overall I achieved an outcome (being able to build a site based on a detailed Figma design by pairing with Claude) that I would never have been able to achieve otherwise to that quality bar in that little amount of time.

ericmcer

1 day ago

[-]

Good point and it really makes you concerned for the branches your brain will go down when confronted with a problem.

I find my first branch more and more being `ask claude`. Having to actually think up organic solutions feels more and more annoying.

Zagreus2142

1 day ago

[-]

I had not thought of visualing my mental debugging process as a decision _tree_ and that LLMs (and talking to other humans) are analogous to a foreign graft. Interesting, thanks!

1 day ago

[-]

My assumption is that I’ll be using AI tools every day for the rest of my life.

I’d rather put hours in figuring out what works and what doesn’t to get more value out of my future use.

mattmanser

1 day ago

[-]

Never have FOMO when it comes to AI. When it's good enough to be a competitive advantage, everyone will catch up with you in weeks, if not days. All you are doing is learning to deal with the very flaws that have to be fixed for it to be worth anything.

Embrace that you aren't learning anything useful. Everything you are learning will be redundant in a year's time. Advice on how to make AI effective from 1 year ago is gibberish today. Today you've got special keyword like ultrathink or advice on when to compact context that will be gibberish in a year.

Use it, enjoy experimenting and seeing the potential! But no FOMO! There's a point when you need to realize it's not good enough yet, use the few useful bits, put the rest down, and get on with real work again.

1 day ago

[-]

I’m not sure if you meant to reply to my comment or someone else’s?

Why would I have FOMO? I am literally not missing out.

> All you are doing is learning to deal with the very flaws that have to be fixed for it to be worth anything.

No it is already worth something.

> Embrace that you aren't learning anything useful

No, I am learning useful things.

> There's a point when you need to realize it's not good enough yet

No, it’s good enough already.

Interesting perspective I guess.

bertylicious

1 day ago

[-]

> No, it's good enough already.

If it takes you hours to figure out what's working and what's not, then it isn't good enough. It should just work or it should be obvious when it won't work.

1 day ago

[-]

I mean that’s like saying doing normal coding or working on any project yourself isn’t good enough because you put in hours to figure out what works and doesn’t.

It’s just that you don’t like AI lol.

bertylicious

1 day ago

[-]

That analogy is off, because LLMs aren't a project I'm working on. They are a tool I can use to do that. And my expectation on tools is that they help me and not make things more complicated than they already are.

When LLMs ever reach that point I'll certainly hear about it and gladly use them. In the meantime I let the enthusiasts sort out the problems and glitches first.

1 day ago

[-]

No, the analogy is good. I’m not just opening up ChatGPT and slapping at the keyboard, there are projects I’m working on.

> And my expectation on tools is that they help me

LLMs do this for me. You just don’t seem to get the same benefit that I do.

> and not make things more complicated than they already are.

LLMs do not do this for me. Things are already complicated. Just because they’re still complicated with LLMs does not mean LLMs are bad.

> When LLMs ever reach that point I'll certainly hear about it and gladly use them

You are hearing about it now. You’re just not listening because you don’t like LLMs.

hooverd

1 day ago

[-]

like, just use it for code that satisfies defined constrains and it kicks ass.

kerpal

2 days ago

[-]

The product/website itself is interesting as a founder who believes heavily in implementing simulations to rigourously test complex systems. However I noticed lots of screenshots and less substance about how it actually works. If your ICP is technical, the frontend and marketing shouldn't be overdone IMO.

I need substance and clear explanations of models, methodology, concepts with some visual support. Screenshots of the product are great but a quick real or two showing different examples or scenarios may be better.

I'm also skeptical many people who are already technical and already using AI tools will now want to use YOUR tool to conduct simulation based testing instead of creating their own. The deeper and more complex the simulation, the less likely your tool can adapt to specific business models and their core logic.

This is party of the irony of AI and YC startups, LOTS of people creating this interesting pieces of software with AI when part of the huge moat that AI provides is being able to more quickly create your own software. As it evolves, the SaaS model may face serious trouble except in the most valuable (e.g. complex and/or highly scalable) solutions already available with good value.

However simulations ARE important and they can take a ton of time to develop or get right, so I would agree this could be an interesting market if people give it a chance and it's well designed to support different stacks and business logic scenarios.

1 day ago

[-]

OP here - I appreciate the feedback and you taking the time to look at the product/website beyond my personal blog post and learnings!

> If your ICP is technical, the frontend and marketing shouldn't be overdone IMO.

Great point. The ICP is technical, so this is certainly valid.

> I need substance and clear explanations of models, methodology, concepts with some visual support. Screenshots of the product are great but a quick real or two showing different examples or scenarios may be better.

We're working hard to get to something folks can try out more easily (hopefully one day Show HN-worthy) and better documentation to go with it. We don't have it yet unfortunately, which is why the site is what it is (for now).

>I'm also skeptical many people who are already technical and already using AI tools will now want to use YOUR tool to conduct simulation based testing instead of creating their own.

Ironically, we'd first assumed simulations would be easy to generate with AI (that's part of why we attempted to do this!) but 18+ months of R&D later and it's turned out to be something very challenging to do, never mind to replicate.

I do think AI will continue to make building SaaS easier but I think there are certain complex products, simulations included (although we'll see), that are just too difficult to build yourself in most cases.

To some extent, as I think about this, I suppose build vs. buy has somewhat always been true for SaaS and it's a matter of cost versus effort (and what else you could do with that effort). E.g. do you architect your own database solution or just use Supabase?

> However simulations ARE important and they can take a ton of time to develop or get right, so I would agree this could be an interesting market if people give it a chance and it's well designed to support different stacks and business logic scenarios.

I appreciate this, and it's certainly been our experience! We're still working to get it right, but it's something I'm quite excited about.

smjburton

2 days ago

[-]

> Since our landing page is isolated from core product code, the risk was minimal. That said, I was constantly sanity-checking what Claude was changing. If I ever “vibed” too hard and lost focus, Claude would sometimes change the wrong files.

> Still, I wouldn’t trust Claude, or any AI agent, to touch production code without close human oversight.

My experience has been similar, and it's why I prefer to keep LLMs separate from my code base. It may take longer than providing direct access, but I find it leads to less hidden/obscure bugs that can take hours (and result in a lot of frustration) to fix.

2 days ago

[-]

> My experience has been similar, and it's why I prefer to keep LLMs separate from my code base.

I'm curious how you're managing this - is it primarily by inputting code snippets or abstract context into something like a Claude or ChatGPT?

I found for myself that I usually was bad at providing sufficient context when trying to work with the LLM separately from the codebase, but also might lack the technical background or appropriate workflow.

smjburton

2 days ago

[-]

> ... is it primarily by inputting code snippets or abstract context into something like a Claude or ChatGPT?

I usually provide the initial context by describing the app that I'm working on (language, framework, etc) as well as the feature I want to build, and then add the files (either snippets or upload) that are relevant to build the feature (any includes or other files it will be integrating with).

This keeps the chat context focused, and the LLM still has access to the code it needs to build out the feature without having access to the full code base. If it needs more context (sometimes I'll ask the LLM if they want access to other files), I'll provide additional code until it feels like it has enough to work with to provide a solution.

It's a little tedious, but once I have the context set up, it works well to provide solutions that are (mostly) bug free and integrate well with the rest of my code.

I primarily work with Perplexity Pro so that I have access to and can switch between all pro level models (Claude, ChatGPT, Grok, etc) plus Google search results for the most up-to-date information.

2 days ago

[-]

Thanks! This is a different approach from what I was imagining so really appreciate you explaining.

I haven’t used Perplexity (Pro or otherwise) much at all yet but will have to try.

smjburton

2 days ago

[-]

You're very welcome! Good luck.

didericis

2 days ago

[-]

I just started using aider, recommend it: https://aider.chat/

It indexes files in your repo, but you can control which specific files to include when prompting and keep it very limited/controlled.

qsort

2 days ago

[-]

"Proceed with caution" seems to be the overwhelming consensus, at least with models having this level of capability. I commend the author for having the humility to recognize the limit of their capability, something we developers too often lack.

2 days ago

[-]

OP here. I really appreciate this comment, thank you. I am more and more aware of my limitations, and am working on prioritizing learning, both about how to build with AI agents but also how to build software more generally, in parallel to projects like this one.

sureglymop

9 hours ago

[-]

Design wise the website seems interesting and good. However, the "how it works" box instantly screamed AI to me because of the emoji list (which, at least to me, instinctively evokes a negative reaction).

However, I do applaud you being transparent about the AI use by posting it here.

ojosilva

2 days ago

[-]

Claude Code has been an awesome experience for me too... without ever subscribing to an Anthropic account!

I've never liked the free-tier Claude (Sonnet/Opus) chat sessions I've attempted with code snippets. Claude non-coding chat sessions were good, but I didn't detect anything magical about the model and the code it churned out for me to decide for a Claude Max Plan. Neither Cursor (I'm also a customer), with its partial use of Claude seemed that great. Maybe the magic is mostly in CC the agent...

So, I've been using a modified CC [1] with a modified claude-code-router [2] (on my own server), which exposes an Anthropic endpoint, and a Cerebras Coder account with qwen-3-coder-480b. No doubt Claude models+CC are well greased-out, but I think the folks in the Qwen team trained (distilled?) a coding model that is Sonnet-inspired so maybe that's the reason. I don't know. But the sheer 5x-10x inference speed of Cerebras makes up for any loss in quality from Sonnet or the FP8 quantization of qwen on the Cerebras side. If starting from zero every few agentic steps is the strategy to use, that with Cerebras is just incredible because it's ~ instantaneous.

I've tried my Cerebras Coder account with way too many coding agents, and for now CC, Cline (VS Code) and Qwen Code (a Gemini Code fork) are the ones that work best. CC beats the pack as it compresses the context just right and recovers well from Cerebras 429 errors (tpm limit), due to the speed (hitting ~1500 tps typically) clashing with Cerebras unreasonably tight request limits. When a 429 comes trough, CC just holds its breath a few seconds then goes at it again. Great experience overall!

[1] I've decompiled CC and modified some constants for Cerebras to fix some hickups

[2] had to remove some invalid request json keys sent by CC using CCR, and added others that were missing

2 days ago

[-]

This is super impressive to me!

> for now CC, Cline (VS Code) and Qwen Code (a Gemini Code fork) are the ones that work best

Thanks for sharing how you set this up, as well as which agents you've found work best.

I tried a handful before settling on CC (for now!) but there are so many new ones popping up and existing ones seem to be rapidly changing. I also had a good experience with Cline in VS Code, but not quite as good as CC.

Haven't tried Quen Code yet (I tried the Gemini CLI but had issues with the usability; the content would frequently strobe while processing which was a headache to look at).

ojosilva

2 days ago

[-]

Sure!

Yeah, definitely it's a coding agent zoo out there. But you can actually notice the polish in CC. Codex looks promising, a more naked look, I hope they invest in it, it's OSS and built in Rust, nice.

Qwen has that same jumpy scrolling as Gemini, and too many boxes, but it works well with Cerebras.

Coding agent product managers out there: stop putting boxes around text! There's no need! In Gemini, the boxes are all of different sizes, it's really ugly to look at. Think about copy-paste, multiline are all messed up with vertical box lines. Argh! Codex, which only has delicate colored border to the left, has a ctrl-t shortcut that is mandatory in TUIs: transcript mode, a ready to copy-paste print out totally togglable.

Another area of improvement is how fast and usable tooling can be. File read/write and patching can really make a difference. Also using the model for different stages of tool calling in parallel, specially if they all get faster like Cerebras. And better compression algos!

1 day ago

[-]

Completely agree with all these areas for improvements! I feel like the less UI/UX frills and the easier it is to just see the relevant information whether it's my prompt, the agent's "thinking" or todos, or relevant code changes or commands, the better.

The speed and usability points you make are so critical. I'm sure these will continue to improve - and hope they do so soon!

Snuggly73

2 days ago

[-]

If you like CC - I'll just leave this here - https://github.com/sst/opencode

ojosilva

2 days ago

[-]

Opencode, been there, it has a Cerebras/qwen-coder out of the box preset! Unfortunately I found the TUI somewhat clunky, I dislike text windows and the terminal scrolling was not available, it was enclosed into the current viewport. One of the features of working in a CLI is that it's scrollable - maybe there's a setting for that?

Also it had some cutoffs with Cerebras - every once in a while it will get a reply then nothing happens, just stops there. I think 4xx errors weren't handled well either. Same happens with Codex with a Cerebras provider. Unfortunately there isn't a compelling reason for me to debug that, although I like that Codex is now Rust and OSS, much more fun than decompiling Claude for sure.

That said, I liked that it has sessions, undo and Plan x Code modes ("build" I think it was called), although that's already a non-pattern for most coding agents, it allows me to have say a OpenAI API o3 or gpt-5 to do some paid planning. But even that is not needed with something like Cerebras that just shoots code out of its butt like there's no tomorrow. Just rinse and repeat until it gets it right.

EDIT: just recalled another thing about opencode that messed me up: no exiting on ctrl-d blows up my mental unix cognition.

yojo

2 days ago

[-]

Tell Claude to fix the scroll-blocker on the codeyam.com landing page.

This seems to be a bad practice LLMs have internalized; there should be some indication that there’s more content below the fold. Either a little bit of the next section peeking up, or a little down arrow control.

I vibe coded a marketing website and hit the same issue.

2 days ago

[-]

OP again - do you mind sharing what browser you're using? I'm looking into this now and am seeing a scroll bar on Chrome and Safari currently, so am wondering if it's a browser-specific bug or something else. Would love to figure out a fix and appreciate any additional info you can share on what you're seeing.

yojo

1 day ago

[-]

I’m realizing we might be talking past each other. By “scroll-blocker” I meant a visual design that hides the fact that there’s more content down the page. It’s not literally preventing scrolling.

Here’s a decent writeup on the problem and some design options: https://uxdesign.cc/dear-web-designer-let-s-stop-breaking-th...

yojo

1 day ago

[-]

I’m on mobile Safari, but it’s the same on Chrome on my laptop.

The scroll bar behavior is an OS-level setting. The default in MacOS is to not show scroll bars unless you’re actively scrolling.

If you’re using an input device that doesn’t support gestures, or you changed the setting, you’ll see them always.

16 hours ago

[-]

Got it, that’s really helpful. Thank you! Will work on making the content below the fold more visible on the site, and appreciate the help here.

ziml77

2 days ago

[-]

Oh yeah that's really bad to not have any hints that there's more info. I suspect many people are going to hit that page and think there's nothing useful to see. (It's also partly the fault of browsers for not showing scroll bars, but that's the default so you need to design for it)

2 days ago

[-]

Good point and feedback, thanks!

2 days ago

[-]

OP here, thanks for sharing the feedback. I'll investigate and good to know! I think this actually might be human error (my interpretation of the designs) rather than Claude's fault FWIW.

EdwardDiego

1 day ago

[-]

> It was silly yet satisfying when Claude (the PR reviewer) agreed the change looked good and was ready to merge.

> "Approved - Ship It!” and ‘Great work on this!”

This pat on the head from an algorithm gives me the creeps, and I'm really struggling to put my finger on why.

Maybe it's because it's emulated approval, yet generating real feelings of pleasure in the author?

ffsm8

1 day ago

[-]

It's especially hilarious because ime, when it's drifting into this mode of constantly saying "production ready, ready to merge, ready to ship", the code it's producing is usually in a complete downward spiral - likely because it's no longer able to reason about the effect the "one little change" will have on the rest of the application.

Ive come to just terminate the session if a phrase like that turns up

Tbf to Nadia however, her comment supposedly came from the "code reviewer" agent? So the prompt might've explicitly asked it to make this statement and would (hopefully) not be reusing the context of the development (and neither the other way)

16 hours ago

[-]

OP here (Nadia). Yes, in my case I was specifically asking for it to act as a code reviewer and make comments and a recommendation on whether to merge the PR with my changed into our main branch. This just happened to be the response.

I feel like Claude specifically uses language in what seems like a more playful way. I notice this also when instead of just a loader there are things like “Baking… Percolating…” etc.

I do get the ick if it feels like the agent is trying to be too “human” but in this case I just thought it was funny language for a response to my ask specifically to play a PR reviewer role.

Claude in reviewer mode also had a funny (at least to me) comment about it being interesting that Claude put itself as a PR contributor. I think in the screenshot in my blog post (if it got cutoff for others let me know and I can fix), but not called out in the text.

idiotsecant

1 day ago

[-]

It should open a little door in the side of your computer and drop a little piece of kibble to you.

"Good boy! Good PR!"

16 hours ago

[-]

Haha that’s too Pavlovian for me! I prefer to self-manage little treats once done with my Claude time :)

yde_java

2 days ago

[-]

Talking about coding websites: I'm a seasoned dev who loves to be in control of the website code, but hates debugging nitty gritty layouting issues, which steal tons of time and attention. I want to progress fast building great landing websites for my tech products, with the same speed that I code the products themselves. What stacks and LLM tools (if any) do you recommend that help writing great looking websites with great SEO support... fast?

1 day ago

[-]

I'm far from an expert, but I think depending on whether you have website designs or not already, you could use the Figma Dev Mode MCP Server + Claude Code as I did.

I've heard increasingly good things about Cursor and Codex, but haven't tried them as recently. Cline (as a VS Code extension) might also be helpful here.

If you need designs, something like v0 could work well. There are a ton of alternatives (Base44, Figma Make, etc.) but I've found v0 works the best personally, although it probably takes a bit of trial and error.

For SEO support specifically, I might just try asking some of the existing AI tooling to try to help you optimize there although I'm not sure how well the results would be. I briefly experimented with this and early results seemed promising, but did not push on it a lot.

ta12653421

1 day ago

[-]

Just pay the 20 USD for ClaudeAI for the beginning, then after 4 - 6 weeks check if you are happy.

simonw

2 days ago

[-]

"Initially, I did all my work locally. Meaning if anything had happened to my laptop, all my work would have been lost."

I run Dropbox on my laptop almost entirely as insurance against my laptop breaking or getting stolen before I've committed and pushed my work to git.

qsort

2 days ago

[-]

I'm surprised nobody is working on a "git for normal people" kind of software. Part of the reason why Codex or Claude can work the way they do is that even if they screw up you always have a clear picture of what were the changes and what to roll back.

If for some hypothetical reason we still were in the era of tarballs, I doubt they'd be as useful.

(also yeah, I have iCloud pretty much for the same reason)

2 days ago

[-]

Agreed! I kind of wonder if the best version of git for normal people is just...normal people learning to use git, at least to date. Which is not ideal. I'd be surprised if this didn't change with AI bringing non-engineers closer to the code (and thus out of necessity closer to things like git).

2 days ago

[-]

Running Dropbox on your laptop is a smart insurance policy there, and not one I'd thought of! Appreciate the idea, thank you.

bilater

2 days ago

[-]

The way I work with Claude Code is to stage partial changes that I am happy with gradually so its easy to discard unstaged changes. It's a hack to mimic the keep all undo flow in Cursor Agent. Hopefully they can just have an easier way of reverting in future.

1 day ago

[-]

This is a helpful tip, thank you! I've started staging partial changes recently, but haven't gotten into a great workflow there and sometimes forget about them. Agreed on hopefully adding a better way to revert in the future!

It's been a bit since I tried Cursor and I may need to revisit that as well.

b_e_n_t_o_n

2 days ago

[-]

Neat. Although I get the feeling you're more technical than you give yourself credit for. I gotta try the Figma MCP server and see if it can generate HTML and styles, as that's the most boilerplaty part of front end.

1 day ago

[-]

I appreciate you saying that, thank you! I am learning a lot as I go as well. I'd be curious to hear how it goes for you! The Figma MCP server has its quirks but overall have found it plus Claude Code an effective combination.

indigodaddy

2 days ago

[-]

Nice looking page! One note, all the images seem to be right-justified (on my Android Chrome). That could be a good thing vs centered, but just thought I'd note it.

2 days ago

[-]

Thanks very much! In this case, the design was to be right-justified (on mobile) so as long as you can see the full image, I believe it's working as intended. If it looks wonky definitely let me know and can dig into it or share feedback to our designer.

ademup

2 days ago

[-]

For anyone else on the fence about moving to CLI: I'm really glad I did.

I am converting a WordPress site to a much leaner custom one, including the functionality of all plugins and migrating all the data. I've put in about 20 hours or so and I'd be shocked if I have another 20 hours to go. What I have so far looks and operates better than the original (according to the owner). It's much faster and has more features.

The original site took more than 10 people to build, and many months to get up and running. I will have it up single-handedly inside of 1 month, and it will have much faster load times and many more features. The site makes enough money to fully support 2 families in the USA very well.

My Stack: Old school LAMP. PHPstorm locally. No frameworks. Vanilla JS.

Original process: webchat based since sonnet 3.5 came out, but I used Gemini a lot after 2.5 pro came out, but primarily sonnet.

- Use Claude projects for "features". Give it only the files strictly required to do the specific thing I'm working on. - Have it read the files closely, "think hard" and make a plan - Then write the code - MINOR iteration if needed. Sometimes bounce it off of Gemini first. - the trick was to "know when to stop" using the LLM and just get to coding. - copy code into PHPStorm and edit/commit as needed - repeat for every feature. (refresh the claude project each time).

Evolution: Finally take the CLI plunge: Claude Code - Spin up a KVM: I'm not taking any chances. - Run PHPStorm + CC in the KVM as a "contract developer" - the "KVM developer" cannot push to main - set up claude.md carefully - carefully prompt it with structure, bounds, and instructions

- run into lots of quirks with lots of little "fixes" -- too verbose -- does not respect "my coding style" -- poor adherence to claude.md instructions when over half way through context, etc - Start looking into subagents. It feels like it's not really working? - Instead: I break my site into logical "features" -- Terminal Tab 1: "You may only work in X folder" -- Terminal Tab 2: "You may only work in Y folder". -- THIS WORKS WELL. I am finally in a "HOLY MOLLY, I am now unquestionably more productive territory!"

Codex model comes out - I open another tab and try it - I use it until I hit the "You've reached your limit. Wait 3 hour" message. - I go back to Claude (Man is this SLOW! and Verbose!). Minor irritation. - Go back to Codex until I hit my weekly limit - Go back to Claude again. "Oh wow, Codex works SO MUCH BETTER for me". - I actually haven't fussed with the AGENTS.md, nor do I give it a bunch of extra hand-holding. It just works really well by itself. - Buy the OpenAI PRO plan and haven't looked back.

I haven't "coded" much since switching to Codex and couldn't be happier. I just say "Do this" and it does it. Then I say "Change this" and it does it. On the rare occasions it takes a wrong turn, I simply add a coding comment like "Create a new method that does X and use that instead" and we're right back on track.

We are 100% at a point where people can just "Tell the computer what you want in a web page, and it will work".

And I am SOOOO Excited to see what's next.

2 days ago

[-]

I'd heard Codex improved a ton since I last tried it (definitely prior to some of the latest improvements), and am really tempted to give it another shot. This is very much inspiring me to do so asap, thank you for sharing!

AtlasBarfed

1 day ago

[-]

"We are 100% at a point where people can just "Tell the computer what you want in a web page, and it will work"."

I await the good software. Where is the good software?

cube00

1 day ago

[-]

> > We are 100% at a point where people can just "Tell the computer what you want in a web page, and it will work"

> I await the good software. Where is the good software?

Exactly this, it looks great on the surface until you dig in to find it using BlinkMacSystemFont and absolute positioning because it can't handle a proper grid layout.

You argue with it and it adds !important everywhere because the concept of cascading style is too much for its context window.

ccvannorman

1 day ago

[-]

As someone who walked into 20k+ loc React/Next project, 95%+ vibecoded, I can say it's a relative nightmare to untangle the snarl of AI generated solutions. Particularly it is bad at separation of concerns and commingling the data. I found several places where there were in-line awaits for database objects, then db manipulations being done inline too, and I found them in the ux layer, the api layer, and even nested inside of other db repo files!

Someone once quipped that AI is like a college kid who studied a few programming courses, has access to all of stack overflow, lives in a world where hours go by in the blink of an eye, and has an IQ of 80 and is utterly incapable of learning.

ccvannorman

1 day ago

[-]

Yes, better prompting is absolutely essential, and I still love to let Claude do the heavy lifting when it comes to syntax and framing. But in trying to re-write the data model for this app, Claude continually failed to execute due to prompt size or context size limits (I use Claude Max). Breaking it into smaller parts became such a chore that I ended up doing a large part "by hand" (weird that we've come to expect so much automation, that "by hand" feels old school already!)

Oh, also when it broke down and I tried to restart (the data model rewrite) using a context summary, it started going backwards and migrating back to the old data model beacuse it couldn't tell which one was which .. sigh.

1 day ago

[-]

I run 4 to 8 Claude Codes in parallel daily, AMA

phi-go

1 day ago

[-]

Ask you anything? Well here it goes...

What languages do you use?

What kind of projects?

Do you maintain these projects or is this for greenfield development?

Could you fix any bugs without Claude?

Are these projects tested, who writes the tests. If it's Claude how do you know these tests actually test something sensible?

Is anybody using these projects and what do users think of using these projects?

1 day ago

[-]

What languages do you use?

- HTML, JavaScript, Python, PHP, Rust

What kind of projects?

- Web apps (consumer and enterprise), games

Do you maintain these projects or is this for greenfield development?

- Both, I have my own projects and client projects

Could you fix any bugs without Claude?

- Yes, I have decades of software development experience

Are these projects tested, who writes the tests. If it's Claude how do you know these tests actually test something sensible?

- For serious projects yes, I will define the test cases and have Claude build them out along with any additional cases it identifies. I use planning mode heavily before any code gets written.

Is anybody using these projects and what do users think of using these projects?

- Yes, these are real projects in production with real users :) They love them

TheTaytay

1 day ago

[-]

Nice. Do you "manage" them in any fancy way other than simply having multiple terminal windows open?

1 day ago

[-]

Nothing fancy, I use Ghostty and split view. You can split one window into as many panes as you like.

antihero

1 day ago

[-]

Do you use git worktrees?

1 day ago

[-]

I've tried them for some more complex projects, but I've found as long as I am working on isolated features/bugs, with good component breakdowns the agents won't cross paths. Planning mode also helps because you know exactly which files will be modified before going into execution mode.

If you want to keep features in separate Git worktrees, https://conductor.build/ is pretty nice.

1 day ago

[-]

That's awesome, wow! Do you do anything to try to optimize / monitor token usage?

1 day ago

[-]

I use the 20x max plan, so rarely hit the limit. You can add token usage to /statusline

15 hours ago

[-]

Makes sense, thanks!

jbs789

1 day ago

[-]

Why?

1 day ago

[-]

I can build fullstack projects about 5x faster than I used to.

Zagreus2142

1 day ago

[-]

I'm sorry but this article is marketing. From the 3rd paragraph from the end:

> Since our landing page is isolated from core product code, the risk was minimal.

The real question to ask is why your landing page so complex, it is a very standard landing page with sign-ups, pretty graphics, and links to the main bits of the website and not anything connected to a demo instance of your product or anything truly interactable.

Also, you claim this avoided you having to hire another engineer but you then reference human feedback catching the LLM garbage being generated in the repo. Sounds like the appropriate credit is shared between yourself, the LLM, and especially the developer who shepherded this behind the scenes.

1 day ago

[-]

OP here - I'm sorry this felt like marketing; that was not my intent! I deliberately posted on my personal blog and tried to focus this post on how I used agentic tools (Claude Code, the Figma Dev Mode MCP) and not anything about what my startup actually does.

That said, I was working on implementing a redesign for my startup's website as the project for the experiment - there's no way around that as context.

> The real question to ask is why your landing page so complex

I disagree on this; I don't think that was an issue. Our landing page would have been very easy for a developer on our team to build, that was never a question.

That said, we're a small startup team with myself, my cofounder / CTO, one engineer, and a design contractor. The two technical folks (my cofounder / CTO and the engineer) are focused on building our core product for the most part. I absolutely agree credit is due to them both for their work!

For this project, they helped me review a couple of my bigger PRs and also helped me navigate our CI/CD, testing, and build processes. I believe I mentioned their help in my blog post explicitly, but if it wasn't clear enough definitely let me know.

My goal in attempting this project was in no way to belittle the effort of actual developers or engineers on our team, whom I highly respect and admire. Instead, it was to share an experiment and my learnings as I tried to tackle our website redesign which otherwise would not have been prioritized.

1oooqooq

1 day ago

[-]

i don't even want to see which "website" it generated if it was creating canvases on every component.

also loved how in cto mode it went right away to "approve with minor comments" in the code review. this is too perfect in character.