On the other hand the amount of flip flopping they go through is unreal. I’ve witnessed numerous instances where either the cursor bugbot or Claude has found a bug and recommended a reasonable fix. The fix has been implemented and then the LLM has argued the case against the fix and requested the code be reverted. Out of curiosity to see what happens I’ve reverted the code just to be told the exact same recommendation as in the first pass.
I can foresee this becoming a circus for less experienced devs so I turned off the auto code reviews and stuck them in request only mode with a GH action so that I can retain some semblance of sanity and prevent the pr comment history from becoming cluttered with overly verbose comments from an agent.
Bugbot is now a valuable part of our SD process. If you have genuine examples to show that we are just being delusional or haven’t hit a roadblock, I would love to know.
Relative quality is better but the absolute quality is not. I only care about absolute quality.