Ah, now I see it's all in https://github.com/microsoft/markitdown/blob/main/src/markit... – beautifulsoup for htmlish stuff, pdfminer for pdf's, docx via mammoth. (And compared to pandoc, it only outputs markdown and has way fewer options.)
On a similar note, pandoc now can run completely browser-side, very bare-bones demo at https://tweag.github.io/pandoc-wasm/ (try -f html -t markdown and type in some html)
This is a blatant lie, by simply pressing F12, going to the Network tab, and entering in a URI into the WebPage section and pressing Enter.
POST https://markitdown.pro/api/markitdown
-----------------------------238398091440825138514056309576
Content-Disposition: form-data; name="url"
https://wnd.sh
-----------------------------238398091440825138514056309576--
I understand that cross-site fetching might not work in this case, but please do not blatantly lie in your FAQ page. It makes me (and others) trust you infinitely less.I wanted to share a free online tool I created to let everyone easily test Microsoft's new open-source project, MarkItdown. It enables rapid conversion of different file formats and web pages into clean Markdown text. It's surprisingly fast and versatile.
Give it a try here: https://markitdown.pro/
Any thoughts or suggestions are welcome!
How does it run a Python library entirely browser side? Just curious.
(Given the faff of setting up a Python environment, this is a great idea.)