Believing that LLM-enabled tools can play a role in solving this, I've built a tool that automatically generates documentation for legacy codebases using the Model Context Protocol (MCP) & Claude Sonnet. At first glance, I think this approach has merit. Some samples are in the README. I welcome your thoughts.
The Problem: - Legacy codebases are notoriously difficult to understand and navigate - Onboarding new developers takes months - Making changes safely requires deep knowledge of the system - Business stakeholders lack visibility into system architecture
The Solution - an MCP-based tool that: - Scans your codebase - Generates README files at each level of the directory structure - Creates C4 architecture diagrams showing system components and relationships - Builds a complete documentation hierarchy from high-level architecture to implementation details
The tool aims to helps teams: - Onboard developers faster with clear system documentation - Make changes confidently with better understanding of components - Communicate system architecture to stakeholders - Maintain living documentation that evolves with the codebase
Have a look / try it out!
GitHub: https://github.com/jonverrier/McpDoc License: MIT
To credit various other similar works: https://news.ycombinator.com/item?id=43154065 ( jtwaleson's post) https://news.ycombinator.com/item?id=42521769 https://news.ycombinator.com/item?id=41393458
Do you know about bela.live? I learned that it also creates C4 diagrams from code using AI.
I'm building a tool to help developers understand legacy code by identifying with AI the features implemented in the code, and then displaying these features on a map along with their hierarchy and traceability to code. It's a visual feature-first approach to document software.
Link: https://product-map.ai/
I would love to hear your thoughts about it!
I had a look at Bela.live. Thanks for the link, it’s obviously very comprehensive. Philosophically, I don’t think these tools should be storing things in their own database - I think the diagrams / markup should be going back to the repo in a format the LLMs can then use for further context.
Your tool is nice, and there is definitely mileage here. I think though if I am honest, my view is the same as with Bela.live : the features you extract should be able to go back to the repo / made available to the models once the developer / product engineer has validated them.
I think that tools that can’t open up fully to bring the models into the workflow as first class participants will be overtaken by those that do.