It turns any public Github repository into a text extract that you can give to your favourite LLM easily.
Today I added this url trick to make it even easier to use!
How I use it myself: - Quickly generate a README.md boilerplate for a project - Ask LLMs questions about an undocumented codebase
It is still very much work in progress and I plan to add many more options (file size limits, exclude patterns..) and a public API
I hope this tool can help you Your feedback is very valuable to help me prioritize And contributions are welcome!
I've made https://uithub.com 2 months ago. Its speciality is the fact that seeing a repo's raw extract is a matter of changing 'g' to 'u'. It also works for subdirectories, so if you just want the docs of Upstash QStash, for example, just go to https://uithub.com/upstash/docs/tree/main/qstash
Great to see this keeps being worthwhile!
I think it makes sense.
YAML is shorter and easier to read, Markdown codeblocks have no added syntax between the lines compared to normal code.
But JSON vs JSONL I can't come up with any big advantages for the LLM, it's mostly the same.
Previous example but in JSON:
https://uithub.com/upstash/docs/tree/main/qstash?accept=appl...
Is there any reason to prefer JSONL besides it being more efficient to edit? I'm happy to add it to my backlog if you think it has any advantages for LLMs
// Fetch stars when page loads
fetchGitHubStars();
I do not understand why in the world so much of the code is related to poking the GH api to fetch the star countI made a similar CLI tool[0] with the added feature that you can pass `--outline` and it'll omit function bodies (while leaving their signatures). I've found it works really well for giving a high-level overview of huge repos.
You can then progressively expand specific functions as the LLM needs to see their implementation, without bloating up your context window.
A few observations from building large-scale repo analysis systems:
1. Simple text extraction often misses critical context about code dependencies and architectural decisions 2. Repository structure varies significantly across languages and frameworks - what works for Python might fail for complex C++ projects 3. Caching strategies become crucial when dealing with enterprise-scale monorepos
The real challenge is building a universal knowledge graph that captures both explicit (code, dependencies) and implicit (architectural patterns, evolution history) relationships. We've found that combining static analysis with selective LLM augmentation provides better context than pure extraction approaches.
Curious about others' experiences with handling cross-repository knowledge transfer, especially in polyrepo environments?
Ctrl-a + ctrl-c would remain fast.
- for browsers it shows html - for curl is gets raw text
What'd you say are the differences with using sth like Cursor, which has access to your codebase already?
I actually use txtar with a custom CLI to quickly copy multiple files to my clipboard and paste it into an LLM chat. I try not to get too far from the chat paradigm so I can stay flexible with which LLM provider I use
It's quite useful, with some filtering options (hidden files, gitignore, extensions) and support for Claude-style tags.
Do you have any plans to expand it?
What you can do with something like this is store it in a database and then query it for relevant chunks, which you then feed to the LLM as needed.
I also really like this idea in general of APIs being domains, eventually making the web a giant supercomputer.
Edit: There is literally nothing wrong with this comment but feel free to keep downvoting, only 5,600 clicks to go!