FilterHN

Ask HN: How are you vibe coding in an established code base?

9 points

11 hours ago

| 3 comments

Here’s how we’re working with LLMs at my startup.

We have a monorepo with scheduled Python data workflows, two Next.js apps, and a small engineering team. We use GitHub for SCM and CI/CD, deploy to GCP and Vercel, and lean heavily on automation.

Local development: Every engineer gets Cursor Pro (plus Bugbot), Gemini Pro, OpenAI Pro, and optionally Claude Pro. We don’t really care which model people use. In practice, LLMs are worth about 1.5 excellent junior/mid-level engineers per engineer, so paying for multiple models is easily worth it.

We rely heavily on pre-commit hooks: ty, ruff, TypeScript checks, tests across all languages, formatting, and other guards. Everything is auto-formatted. LLMs make types and tests much easier to write, though complex typing still needs some hand-holding.

GitHub + Copilot workflow: We pay for GitHub Enterprise primarily because it allows assigning issues to Copilot, which then opens a PR. Our rule is simple: if you open an issue, you assign it to Copilot. Every issue gets a code attempt attached to it.

There’s no stigma around lots of PRs. We frequently delete ones we don’t use.

We use Turborepo for the monorepo and are fully uv on the Python side.

All coding practices are encoded in .cursor/rules files. For example: “If you are doing database work, only edit Drizzle’s schema.ts and don’t hand-write SQL.” Cursor generally respects this, but other tools struggle to consistently read or follow these rules no matter how many agent.md-style files we add.

My personal dev loop: If I’m on the go and see a bug or have an idea, I open a GitHub issue (via Slack, mobile, or web) and assign it to Copilot. Sometimes the issue is detailed; sometimes a single sentence. Copilot opens a PR, and I review it later.

If I’m at the keyboard, I start in Cursor as an agent in a Git worktree, using whatever the best model is. I iterate until I’m happy, ask the LLM to write tests, review everything, and push to GitHub. Before a human review, I let Cursor Bugbot, Copilot, and GitHub CodeQL review the code, and ask Copilot to fix anything they flag.

Things that are still painful: To really know if code works, I need to run Temporal, two Next.js apps, several Python workers, and a Node worker. Some of this is Dockerized, some isn’t. Then I need a browser to run manual checks.

AFAICT, there’s no service that lets me: give a prompt, write the code, spin up all this infra, run Playwright, handle database migrations, and let me manually poke at the system. We approximate this with GitHub Actions, but that doesn’t help with manual verification or DB work.

Copilot doesn’t let you choose a model when assigning an issue or during code review. The model it uses is generally bad. You can pick a model in Copilot chat, but not in issues, PRs or reviews.

Cursor + worktrees + agents suck. Worktrees clone from the source repo including unstaged files, so if you want a clean agent environment, your main repo has to be clean. At times it feels simpler to just clone the repo into a new directory instead of using worktrees.

What’s working well: Because we constantly spin up agents, our monorepo setup scripts are well-tested and reliable. They also translate cleanly into CI/CD.

Roughly 25% of “open issue → Copilot PR” results are mergeable as-is. That’s not amazing, but better than zero, and it gets to ~50% with a few comments. This would be higher if Copilot followed our setup instructions more reliably or let us use stronger models.

Overall, for roughly $1k/month, we’re getting the equivalent of 1.5 additional junior/mid engineers per engineer. Those “LLM engineers” always write tests, follow standards, produce good commit messages, and work 24/7. There’s friction in reviewing and context-switching across agents, but it’s manageable.

What are you doing for vibe coding in a production system?

▲

dazamarquez

9 hours ago

[-]

I use AI to write specific types of unit tests, that would be extremely tedious to write by hand, but are easy to verify for correctness. That aside, it's pretty much useless. Context windows are never big enough to encompass anything that isn't a toy project, and/or the costs build up fast, and/or the project is legacy with many obscure concurrently moving parts which the AI isn't able to correctly understand, and/or overall it takes significantly more time to get the AI to generate something passable and double check it than just doing it myself from the get go.

Rarely, I'm able to get the AI to generate function implementations for somewhat complex but self-contained tasks that I then copy-paste into the code base.

▲

Sevii

8 hours ago

[-]

Can you setup automated integration/end-to-end tests and find a way to feed that back into your AI agents before a human looks at it? Either via an MCP server or just a comment on the pull request if the AI has access to PR comments. Not only is your lack of an integration testing pipeline slowing you down, it's also slowing your AI agents down.

"AFAICT, there’s no service that lets me"... Just make that service!

▲

bitbasher

8 hours ago

[-]

I generally vibe code with vim and my playlist in Cmus.