You can either chat directly with the page or add the config to Cursor/Claude to pipe the website/docs straight into your context.
Why MCP? Using MCP is better than raw scraping or copy-pasting because it converts the page into clean Markdown. This helps the AI understand the structure better and uses significantly fewer tokens.
How it works: It is a proxy that fetches the URL, removes ads and navigation, and exposes the clean content as a standard MCP Resource.
Repo: https://github.com/Ami3466/tomcp (Inspired by GitMCP, but for the general web)
I know this is pointing to the GH repo, but I’d love to know more about why the author chose to build it this way. I suspect it keeps costs low/free. But why CF workers? How much processing can you get done for free here?
I’m not sure how you could do much more in a CF worker, but this might be too simple to be useful on many sites.
Example: I had to pull in a docs site that was built for a project I’m working on. We wanted an LLM to be able to use the docs in their responses. However, the site was based on VitePress. I didn’t have access to the source markdown files, so I wrote an MCP fetcher that uses a dockerized headless chrome instance to load the page. I then pull the innerHTML directly from the processed DOM. It’s probably overkill, but an example of when this tool might not work.
But — if you have a static site, this tool could be a very simple way to configure MCP access. It’s a nice idea!
Also, by treating this as an MCP Resource rather than a Tool, the docs are pinned permanently instead of relying on the model to "decide" to fetch them.
Cloudflare Workers handle this perfectly for free (100k reqs/day) without the overhead of managing a dockerized browser instance.
But I do think the lack of a JavaScript loader will be a problem for many sites. In my case, I still run the innerHTML through a Markdown converter to get rid of the extra cruft. You’re right that this helps a lot. Even better if you can choose which #id element to load. Wikipedia has a lot of extra info that surrounds the main article that even with MD conversion adds extra fluff. But without the JS loading, you’re still going to not be able to process a lot of sites in the wild.
Now, I would personally argue that’s an issue with those sites. I’m not a big fan of dynamic JS loaded pages. Sadly, I think that that ship has sailed…
From what I can see, if the content I want to enrich is static, the web fetch tool seems sufficient. Is this tool capable of extracting information from dynamic websites or sites behind login walls, or is it essentially the same as a web fetch tool that only works with static pages?
1. Standard web_fetch tools usually dump raw HTML into the context (including navbars, scripts, and footer noise). This wastes a huge amount of tokens and distracts the model. toMCP runs the page through a readability parser and converts it to clean markdown before sending it to the AI.
2. Adding a website as an MCP Resource pins it as a permanent, read-only context, making it ideal for keeping documentation constantly available. This differs from the web_fetch tool, which is an on-demand action the AI only triggers when it decides to, meaning the data isn't permanently attached to your project.
Does this skirt the robots.txt by chance? Not being to fetch any web page is really bugging me and I'm hoping to use a better web_fetch that isn't censored. I'm just going to copy/paste the content anyway.
Different use cases, I think.