Instead of doing what the author is doing here, and sending messages back and forward, leading to a longer and longer conversation, where each messages leads to worse and worse quality replies, until the LLM seems like a dumb rock, rewrite your initial message with everything that went wrong/was misunderstood, and aim to have whatever you want solved in the first message, and you'll get a lot higher quality answers. If the LLM misunderstood, don't reply "No, what I mean was..." but instead rewrite the first message so it's clearer.
This is at least true for all ChatGPT, Claude and DeepSeek models, YMMV with other models.
Inasmuch as these are collaborative document generators at their core, "minimally ambiguous prompt and conforming reply" is a strongly represented document structure and so we benefit by setting them up to complete one.
Likewise, "tragi-comic dialog between increasingly frustrated instructor and bumbling pupil" is also a widely represented document structure that we benefit by trying to avoid.
Chatbot training works to minimize the chance of an LLM engaging in the latter, because dialog is a intuitive interface that users enjoy, but we can avoid the problem more successfully by just providing a new and less ambiguous prompt in a new session, as you suggest.
Do people enjoy chat interfaces in their workflows?
I always thought that cursor/copilot/copy.ai/v0.dev were so popular because they break away from the chat UI.
Dialog is cool when exploring but, imo, really painful when trying to accomplish a task. LLMs are far too slow to make a real fluid conversation.
I'm not 100% sure either, I think it might just be a first-iteration UX that is generally useful, but not specifically useful for use cases like coding.
To kind of work around this, I generally keep my prompts as .md files on disk, treat them like templates where I have variables like $SRC that gets replaced with the actual code when I "compile" them. So I write a prompt, paste it into ChatGPT, notice something is wrong, edit my template on disk then paste it into a new conversation. Iterate until it works. I ended up putting the CLI I use for this here, in case others wanna try the same approach: https://github.com/victorb/prompta
If one branch doesn't work out you go back to the last node that gave good results or the top and create another branch with a different prompt from.
Or if you want to ask something in a different direction but don't want all the baggage from recent nodes.
Example: https://exoloom.io/trees
The reasonable alternative is a chat interface that lets you edit any text, the AI response or your prompts, and regenerate from any point. This is why I use the API "playground" interfaces or something like LibreChat. Deepseek at least has prompt editing/regeneration.
For coding, I'd agree. But seemingly people use LLMs for more than that, but I don't have any experience myself. But I agree with the idea that we haven't found the right UX for programming with LLMs yet. I'm getting even worse results with Aider, Cursor and all of those, than just my approach outlined above, so that doesn't seem like the right way either.
From personal experience, I agree with you, but I wouldn't make the critique here as it is far from a magic bullet. Honestly, with the first stuff it seems faster to learn mermaid and implement it yourself. Mermaid can be learned in a rather short time, the basic syntax is fairly trivial and essentially obvious. As an added benefit, you then get to have this knowledge and use it later on. This will certainly feel slower than the iterative back and forth with a LLM -- either by follow-up conversations or refining your one shot -- but I'm not convinced it will be a huge difference in time as measured by the clock on the wall[0]
[0] idk, going back and forth with an LLM and refining my initial messages feels slow to me. It reminds me of print statement debugging in a compiled language. Lots of empty time.
It doesn't seem like that to me. At one point in the article: "There are also a few issues [...] Let’s fix with the prompt" and then a prompt that is referring the previous message. Almost all prompts after that seem to depend on the context before them.
My point is that instead of doing that, revise the original initially message so the very first response from the LLM doesn't contain any errors, because (in my experience) that's way easier and faster than trying to correct errors by adding more messages, since they all (even O1 Pro) seem to lose track of what's important in the conversation really fast.
To be honest, this would help a lot of person-implemented iteration too, if it was biologically feasible to erase a conversation from a brain.
It has `log` and `rewind` commands that allow you to easily back up to any previous point in the conversation and start again from there with an updated prompt. Plandex also has branches, which can be helpful for not losing history when using this approach.
You’re right that it’s often a way to get superior results. Having mistakes or bad output in the conversation history tends to beget more mistakes and bad output, even if you are specifically directing the LLM to fix those things. Trial and error with a new prompt and clean context avoids this problem.
P.S. I wrote a bit about the pros and cons of this approach vs. continuing to prompt iteratively in Plandex’s docs here: https://docs.plandex.ai/core-concepts/prompts#which-is-bette...
I guess you can do that too, as long as you start a new conversation afterwards. Personally I found it much easier to keep prompts in .md files on disk, and paste them into the various interfaces when needed, and then I iterate on my local files if I notice the first answer misunderstood/got something wrong. Also lets you compose prompts which is useful if you deal with many different languages/technologies and so on.
Then we injected the generated mermaid diagrams back into subsequent requests. Reasoning performance improves for a whole variety of applications.
Could you go into a bit more detail on how you encode the intent?
Sketching backed by automated cleanup can be good for entering small diagrams. There used to be an iOS app based on graphviz: http://instaviz.com
Constraint-based interactive layout may be underinvested, as a consequence of too many disappointments and false starts in the 1980s.
LLMs seem ill-suited to solving the optimization of combinatorial and geometric constraints and objectives required for good diagram layout. Overall, one has to admire the directness and simplicity of mermaid. Also, it would be great to someday see a practical tool with the quality and generality of the ultra-compact grid layout prototype from the Monash group, https://ialab.it.monash.edu/~dwyer/papers/gridlayout2015.pdf (2015!!)
>LLMs seem ill-suited to solving the optimization of combinatorial and geometric constraints and objectives required for good diagram layout.
I think this is where LLM distance NLP cousin can be of help namely CUE since fundamentally it's based on feature structure from the deterministic approach of NLP unlike LLM that's stochastic NLP [1],[2],[3].
Based on the Monash's paper, Constraint Programming (CP) is one of the popular approaches that's being used for the automatic grid layout.
Since CUE is a constraint configuration language belong to CP, and its NLP background should make it easier and seamless to integrate with LLM. If someone somehow can crack this then it will be a new generation LLM that can perform good and accurate diagramming via prompts and it will be a boon for the architect, designer and engineer. Talking about engineer, if this approach can also be used for IC layout design (analog and digital) not only for diagrams, it will easily disrupt the multi-billion dollars industry for the very expensive software for IC design and man powers.
I hope I'm not getting ahead of myself, but ultimately this combo can probably solve the "holy grails" problem mentioned towards the end of the paper's conclusions regarding layout model that somehow incorporates routing in a way that is efficiently solvable to optimality. After all some people in computer science consider CP as "holy grails" of programming [4].
Please someone somehow make a start up, or any existing YC startup like JITX (Hi Patrick) can look into this potential fruitful endeavor of hybrid LLM combo for automated IC design [5].
Perhaps your random thoughts are not so random but deterministic non-random in nature, pardon the pun.
[1] Cue – A language for defining, generating, and validating data:
https://news.ycombinator.com/item?id=20847943
[2] Feature structure:
https://en.m.wikipedia.org/wiki/Feature_structure
[3] The Logic of CUE:
https://cuelang.org/docs/concept/the-logic-of-cue/
[4] Solving Combinatorial Optimization Problems with Constraint Programming and OscaR [video]:
https://m.youtube.com/watch?v=opXBR00z_QM
[5] JITX: Automatic circuit board design:
I thought that LLMs are great at compressing information and thought of putting it to good use by compressing a large codebase into a single diagram. Since entire codebase doesn't fit in the context window, I built a recursive LLM tool that calls itself.
It takes two params: * current diagram state, * new files it needs to expand the diagram.
The seed set would be an empty diagram and an entry point to source code. And I also extended it to complexity analysis.
It worked magically well. Here are couple of diagrams it generated: * https://gist.github.com/priyankc/27eb786e50e41c32d332390a42e... * https://gist.github.com/priyankc/0ca04f09a32f6d91c6b42bd8b18...
If you are interested in trying out, I've blogged here: https://updates.priyank.ch/projects/2025/03/12/complexity-an...
Make sure it is allowed to think before doing (not necessarily in a dedicated thinking mode, it can be a regular prompt to design a graph before implementing it; make sure to add in a prompt who the graph is for (e.g. "a clean graph, suitable for a blog post for technical audience").
I do like the idea of another commenter here who takes a photo of their whiteboard and instructs the AI tool to turn it into a structured diagram. That seems to be well within reach of these tools.
It also really depends on the printing.
depends on the prompting I guess :D
sorry
They have icons for common things like cloud things.
But it was good at arranging the elements in timeline order for example.
Interesting perspective but it’s a bit incomplete without a comparison of various models and how they perform.
Kind of like Simon Willison’s now-famous “pelican on a bicycle” test, these diagrams might be done better by some models than others.
Second, this presents a static picture of things, but AI moves really fast! It’d also be great to understand how this capability is improving over time.
https://www.heise.de/ratgeber/Prozessvisualisierung-mit-gene...
I also experimented with bpmn markup (xml). Realized there are already repos on GitHub creating bpmn diagrams from prompt.
You can also ask llms to create svg.
I'm mainly speaking to the ability to read IaC code ([probably of any library but at LEAST in my case] cdk, pulumi, terraform, cloudformation, serverless) and be able to infer architectural flow from it. It's really not conducive to that use case.
I could also, kidding/not kidding, be speaking to the range of abilities for "mid" and "senior" developers to know and convey such flows in diagrams.
But really my point is this feels like more validation that AI doesn't provide increased ability, it provides existing (and demonstrated) ability faster with less formalized context. The "less formalized context" is what distinguishes it from programs/code.
Rather than relying on end-user products like ChatGPT or Claude.ai, this article is based on the „pure“ model offerings via API and frontends that build on these. While the Ilograph blog ponders „AI’s ability to create generic diagrams“, I‘d conclude: do it, but avoid the „open“ models and low-cost offerings.
Simon Willison has shown that current models aren't very good at creating an SVG of a pelican on a bicycle, but drawing a box diagram in SVG is a much simpler task.
Can't see it working without letting the model output an intermediary form in PlantUML or Mermaid or Dot - going straight to SVG is cramming too much work into too few tokens.
For the same reason, textual diagram formats are better for iterative work. SVG is too open-ended, and carries little to no semantics. Diagramming language are all about semantics, have fewer degrees of freedom, and much less spurious token noise.
Aside from "try Inkscape", that sounds like a human problem not an LLM problem.
LLMs output what they input, and if diagrams in blog articles or docs are SVG, they merrily input SVG, and associate it with the adjacencies.
One might as well say MidJourney won't work because few ever make paintings using pixels. You're asking it to translate whatever you're asking about (e.g. scenes and names of painters you'd expect the model to know, like Escher or DaVinci), into some hidden imagined scene, render that as brush strokes of types of paint on textured media, and then generate a PNG out of it.
Absolutely do not "try Inkscape", unless you like your LLM choking on kilobytes of tokens it takes to describe the equivalent of "Alice -> Bob" in PlantUML. 'robjan is correct in comparing SVG to a compiled program binary, because that's what SVG effectively is.
Most SVG is made through graphics programs (or through conversion of other formats made in graphics programs), which add tons of low-level noise to the SVG structure (Inkscape, in particular). And $deity forbid you then minify / "clean up" the SVG for publication - this process strips what little semantic content is there (very little, like with every WYSIWYG tool), turning SVG into programming equivalent of assembly opcodes.
All this means: too many degrees of freedom in the format, and dearth of quality examples the model could be trained on. Like with assembly of a compiled binary, LLM can sort of reason about that, but it won't do a very good job at it, and it's a stupid idea in the first place.
> One might as well say MidJourney won't work because few ever make paintings using pixels.
One might say that if asking LLM to output a raster image (say, PPM/PBM format, which is made of tokenizer-friendly text!) - and predictably, LLM will suck at outputting such images, and suck even worse at understanding them.
One might not say that about Midjourney. Midjourney is not an LLM, it's (backed by) a diffusion model. Those are two entirely different beasts. LLM is a sequential next token predictor, a diffusion model is not; it does something more like global optimization across fixed-sized output, in many places simultaneously.
In fact, I bet a textual diffusion model (there are people working on diffusion-based language models) would work better for outputting SVG than LLMs do.
It was a well defined domain so I guess the training data argument doesn't fit for stuff that is within a "natural" domain like graphs. LLMs can infer the behavior based on naming quite well.
It's disingenuous to conclude that AI is no good at diagramming after using an impotent prompt AND refusing to iterate with it. A human would do no better with the same instructions, LLMs aren't magic.
This is the same as my previous comment https://news.ycombinator.com/item?id=42524125
That being said, I think part of the potential is repeatability. Once you've done the work of property prompting for the desired result, you can often save the adjusted prompts (or a variation of it) for later use, giving you a flying start on subsequent occasions.
That said, I wouldn't expect things to change too drastically. TFA goes into details, but in short LLMs are already quite good at whiteboarding (where you interactively describe the diagram you want). They're also really bad at generating a diagram from an existing system. In either case, small, incremental improvements won't really help; you'd need a large change to move the needle.