Show HN: PDF2MD – Rust+Redis+ClickHouse+VLLM conversion pipeline for PDFs
11 points
2 days ago
| 0 comments
| github.com
| HN
If you just want to use it, try here - https://pdf2md.trieve.ai . I think the LLM's are astoundingly good at converting complex powerpoint style infographics.

I wouldn't normally think folks on HN would find this interesting as the general concept has been posted about already in the past few months. We were heavily inspired by Zerox[1].

However, the stack we went with was fun and over-engineered which is more likely to create interesting discussion. We use all the same tools at Trieve (our main product), but wanted to see if they would be a good fit for something that needed to get built in a tighter timeline and we think they were!

Took us 2 weeks to get this setup end-to-end and it's by no means complete (see roadmap in linked README). However, it's cool that a relatively cookie cutter web service like this can be created with pure open-source dependencies and non-standard Rust tooling so quickly. Rust won't kill your startup!

- Minijinja templates for the UI[2]

- PDFObject for doc display in-browser[3]

- actix/actix-web HTTP server framework[4]

- Redis queue macro for worker async processing[5]

- Clickhouse for task storage[6]

- chm CLI to handle Clickhouse migrations[7]

- MinIO S3 for object storage[8]

[1]: https://news.ycombinator.com/item?id=41048194

[2]: https://github.com/mitsuhiko/minijinja

[3]: https://github.com/pipwerks/pdfobject

[4]: https://github.com/actix/actix-web

[5]: https://github.com/devflowinc/trieve/blob/main/pdf2md/server...

[6]: https://github.com/ClickHouse/ClickHouse

[7]: https://docs.rs/chm/latest/chm/index.html

[8]: https://github.com/minio/minio

No one has commented on this post.