Generate tests from GitHub pull requests
5 points
3 hours ago
| 1 comment
| HN
I’ve been experimenting with something interesting.

AI coding tools generate code very quickly, but they almost never generate full end to end test coverage. they create a ton of tests mostly unit and intergations but real user scenarios are missing. In many repos we looked at, the ratio of new code vs small number of high quality e2e tests dropped dramatically once teams started using Copilot-style tools or is left for testers as a separate job.

So I tried a different approach.

the system reads a pull request and:

• analyzes changed files • identifies uncovered logic paths - using dependency graph (one repo or multi-repo) • Understand the context via user story or requirements (given as a comment in PR) • generates test scenarios • produces e2e automated tests tied to the PR

in addition if a user can connect with their CMS, or TMS then it can be pulled into as well. (internally i use graphRAG but that is for another post)

Example workflow:

1. Push a PR 2. System reads diff + linked Jira ticket 3. Generates missing tests and coverage report

In early experiments the system consistently found edge cases that developers missed.

Example output:

Code Reference| Requirement ID | Requirement / Acceptance Criteria |Test Type Test ID | Test Description |Status

src/api/auth.js:45-78 | GITHUB-234 / JIRA-API-102 | API should return 400 for invalid token| Integration| IT-01 | Validate response for invalid token Pass

Curious how others are thinking about this kind of traceability. I am a developer too so i am sensitive to only show this to developer and only developer can make it visible to other folks otherwise he can just take the corrective action.

jmathai
1 hour ago
[-]
I think Claude Code can write very good end to end tests given the right constructs.

I have been building a desktop app (electron-based) which interacts with Anthropic’s AgentSDK and the local file system.

It’s 100% spec driven and Claude Code has written every line. I do large features instead of small ones (spec in issue around 300 lines of markdown).

I have had it generate playwright tests from the start. It was doing okay but one thing made it do amazing. I created a spec driven pull request to use data-testid attributes for selectors.

Every new feature adds tests, and verifies it hasn’t broken existing features.

I don’t even bother with unit tests. It’s working amazing.

reply