FilterHN

Show HN: PlutoPrint – Generate PDFs and PNGs from HTML with Python

155 points

1 day ago

| 15 comments

Hi everyone, I built PlutoPrint because I needed a simple way to generate beautiful PDFs and images directly from HTML with Python. Most of the tools I tried felt heavy, tricky to set up, or produced results that didn’t look great, so I wanted something lightweight, modern, and fast. PlutoPrint is built on top of PlutoBook’s rendering engine, which is designed for paged media, and then wrapped with a Python API that makes it easy to turn HTML or XML into crisp PDFs and PNGs. I’ve used it for things like invoices, reports, tickets, and even snapshots, and it can also integrate with Matplotlib to render charts directly into documents.

I’d be glad to hear what you think. If you’ve ever had to wrestle with generating PDFs or images from HTML, I hope this feels like a smoother option. Feedback, ideas, or even just impressions are all very welcome, and I’d love to learn how PlutoPrint could be more useful for you.

▲

phonon

1 day ago

[-]

It would be great if you could run it against the tests at https://www.print-css.rocks/

They would give a much better idea of its complex printing capabilities.

▲

decafb

13 hours ago

[-]

I'm surprised to see the properties I'm most interested in neither in these tests nor in plutoprints supported css. I'm talking about `text-wrap: pretty` (potentially avoid rivers and orpahns, depending on implementation), `orphans`, `widows` and the various `break-`.

▲

pac0

1 day ago

[-]

It should be required to run these tests for these libraries. It's really frustrating to have to discover it trying to make it work.

▲

tannhaeuser

1 day ago

[-]

CSS coverage is stated in [1]. It should be required to do minimal assessment before entitledly posting on HN.

[1]: https://github.com/plutoprint/plutobook/blob/main/FEATURES.m...

▲

phonon

1 day ago

[-]

I'm not sure your point. CSS print conformance is extremely complex. There is a fairly well known open source test suite, that's fully dynamic, that can be run with visual outputs, fail/pass metrics etc. That would give an interested new user a much better grounding if the library is worth using, considering there are already several other options.

▲

socalgal2

1 day ago

[-]

Maybe this isn't the same but it's a relatively few lines of code to use puppeteer to use an actual browser to render pages to PDFs/PNGs. Advantages would be everything is supported. Every new feature in CSS, HTML, SVG, Canvas2D, WebGL, WebGPU, etc... (though for WebGL/WebGPU you might need to pass in some flags to use llvmpipe/mesa/warp etc...

Asking your favorite LLM will give you da codez

PS: I'm not trying to discount this tool. I'm only pointing out an alternative that might be useful

▲

strbean

2 hours ago

[-]

Gotenberg[0] wraps up Chromium nicely for this purpose. However, at scale, spinning up Chromium for each PDF you want to generate is painful. I'd definitely consider a solution like PlutoPrint.

0: https://gotenberg.dev/

▲

sammycage

1 day ago

[-]

That’s a good point. Using Puppeteer or a headless browser gives you essentially full web platform support. The tradeoff is that it comes with a heavier runtime and more moving parts (Chromium, Node, etc.). PlutoPrint aims to be much lighter: no browser dependency, just a compact C++ engine with a Python wrapper. It does not cover the entire browser feature set but it is fast, portable, and easy to drop into projects without the overhead of a full browser.

▲

nicoburns

1 day ago

[-]

Interesting. I was not aware of PlutoBook!

We're doing a very similar thing (custom lightweight engine) over at https://github.com/DioxusLabs/blitz. We have more of a focus on UI, but there's definitely overlap (we support rendering to image, but don't have pagination/fragmentation implemented).

Have you run the WPT tests against your engine to test spec conformance?

▲

specproc

1 day ago

[-]

I did this for a project recently, using Firefox and Selenium. It totally worked, but was very heavy on the dependencies and felt very clumsy.

This is exactly what I was looking for a few months ago. I might revisit that project with it.

▲

realitysballs

1 day ago

[-]

Excellent response

▲

nutjob2

1 day ago

[-]

Your approach is also more predictable. Trying to figure out why Chromium is doing something strange with a complicated page is not practical, while a simple, lean package like this means you can look at the code, trace it and patch it if need be.

▲

slig

1 day ago

[-]

Exactly what I was wondering. I use puppeteer to render these [1] printable puzzles pages, and I use SVG, JavaScript to dynamically resize the text to fit a page, etc. Just works.

[1]: https://ahapdf.nyc3.cdn.digitaloceanspaces.com/samplers/logi... (PDF)

▲

okm

1 day ago

[-]

This is so efficient, i just tested it ,far better than weasyprint, and it has both python and c++ repo, bro am amazed, Are you open for sponsorship?

▲

leetrout

1 day ago

[-]

What OS did you test on? It completely crashed my python process on mac

▲

sammycage

1 day ago

[-]

Thanks for the feedback. We’ve tested PlutoPrint on multiple platforms, including Windows and Linux, and it generally works well there. Mac-specific issues like crashes or empty outputs are definitely on our radar, and we’re investigating potential causes such as font handling, reverse mtime warnings, or system library differences. We’re also tracking bugs and improvements on the GitHub repo: https://github.com/plutoprint/plutoprint/issues. Contributions, bug reports, and additional test results from different environments are very helpful and appreciated as we continue to improve stability across all platforms.

▲

edarchis

1 day ago

[-]

Tried it on Mac too and from Python, it just outputs an empty file, with their own samples. With the command line directly, it complains about reverse mtime /Library/Fonts.

This is the kind of thing that might be fixed with more people attempting to use it, or it could be another pita like having to install an old wkhtmltopdf for Odoo to use.

▲

rrr_oh_man

1 day ago

[-]

That looked like an astroturfing account anyway

▲

okm

1 day ago

[-]

It is working perfectly on my use case. On my Windows.

▲

nutjob2

1 day ago

[-]

https://github.com/sponsors/plutoprint

▲

lewisjoe

13 hours ago

[-]

Hi Samuel,

Building a rendering HTML/CSS rendering engine is no easy job. Congratulations! I'm curious how were you able to pull this off? What documentations were helpful and what was your inspiration? I'm in awe and wat to learn more about this initiative.

▲

sammycage

11 hours ago

[-]

Thank you for your kind words and for noticing the work behind this. Building an HTML and CSS rendering engine has been a long journey with many surprises. I have been maintaining https://github.com/sammycage/lunasvg for years, so I was familiar with interpreting specs and rendering engines. That experience gave me the confidence to tackle HTML.

At first, my plan was simple. I wanted to make an HTML rendering library. But soon, I realized it could be even more useful if it focused on paged output so I could make PDFs directly. C and C++ do not have an HTML-to-PDF library that is not a full web engine. I started coding and thought I could finish in a year by working a few hours each day. But reality came fast. HTML and CSS are much harder than SVG, and even small things caused big problems.

I studied KHTML and WebKit to see how real HTML and CSS engines work. The official specs were very helpful. Slowly, everything started to come together. It felt like discovering a hidden world behind the web pages we see every day.

The hardest part has been TableLayout. Tables look simple, but handling row and column spans, nested tables, alignment, page breaks, and box calculations was very hard. I spent many hours fixing layout bugs that only appeared in some situations. It was frustrating, humbling, and also very satisfying when it worked.

I am still learning and improving. I hope other people enjoy PlutoPrint and PlutoBook as much as I do.

▲

lewisjoe

9 hours ago

[-]

Sounds like a wild ride! Thanks for making this open-source.

Quick question:

1. I see you've hand-written parsers yourself both css & html, why not use existing parsers? was minimizing dependencies one of your goals?

2. Does the project recongnize headers / footers and other such @page css rules?

3. Fragmentation(pagination) logic has a huge set of challenges (at least from what I read about Chrome implementing fragmentation) - did you come across this? - https://developer.chrome.com/docs/chromium/renderingng-fragm....

Was fragmentation logic really that difficult to implement?

▲

sammycage

7 hours ago

[-]

Thanks for your questions!

1. The documentation for HTML and CSS parsers is pretty straightforward and easier to implement, so I thought it was better to write them myself.

2. It fully supports margin boxes (headers and footers) using properties like @top-left and @bottom-center inside @page rules. You can see more here: https://github.com/plutoprint/plutobook/blob/main/FEATURES.m...

3. Yes, I did come across this. Fragmentation logic is as difficult as it sounds. Right now PlutoBook works with a single, consistent page size throughout a document and does not support named pages, which simplifies things a lot.

Feel free to contact me via email if you have more questions.

▲

tannhaeuser

1 day ago

[-]

Isn't the interesting part the CSS renderer PlutoBook (C++) rather than the Python wrapper here?

▲

SigmundA

1 day ago

[-]

Yes definitely, its does the heavy lifting and is essentially a new from scratch HTML rendering engine.

It needs javascript support so charting libraries work but they mention working toward that in the roadmap.

It's more like PrinceXML than a browser. This is great Prince is the gold standard for HTML print out and the only engine to fully support Paged Media level 3 last time I looked. Normal browsers don't seem to care as much about full print css support so Prince has a monopoly here and is not cheap.

https://www.princexml.com

▲

eterps

1 day ago

[-]

How does it differ from https://weasyprint.org ?

▲

sammycage

1 day ago

[-]

WeasyPrint is great, but PlutoPrint takes a different angle: the engine is all C++, so it’s faster and lighter on memory. It can render directly to PNG as well as PDF, and has stronger SVG support.

▲

masfuerte

1 day ago

[-]

PlutoBook looks very impressive. Is it based on another renderer?

▲

skipnup

1 day ago

[-]

Doesn't look like it:

> PlutoBook depends on the following external libraries:

> Required: cairo, freetype, harfbuzz, fontconfig, expat, icu

> Optional: curl, turbojpeg, webp (enable additional features)

▲

Humphrey

1 day ago

[-]

Does anybody have any experience migrating to PlutoPrint from WeasyPrint? Is it seamless? Faster? Any teething issues? Are their reasons to stay with WeasyPrint?

▲

1 day ago

[-]

This isn't theoretical. In my 20 years in retail and logistics, I've seen these libraries repeatedly fail in production. Real world examples include:

* Invoices: Totals get pushed to a new page with no repeated <thead> header. This is a classic failure of CSS table rendering across page breaks. properties like page-break-inside: avoid are notoriously inconsistent in browser print to PDF engines. Line items get split mid row because the engine doesn't understand the semantic integrity of the data.

* Bills of Lading & Manifests: These documents are infamous for unpredictable page breaks. One page cuts a row in half, the next duplicates headers, the next drops content entirely. This often stems from complex flexbox or grid layouts that the PDF rendering engine struggles to paginate deterministically.

* Shipping Labels: A barcode or QR code shifting by a few pixels is often a DPI or scaling artifact. The browser rendering at a logical 96 DPI doesn't translate perfectly to a 300 or 600 DPI thermal printer format, introducing rounding errors that are catastrophic for scanners. Addresses drift outside the printable area because CSS margins (margin, padding) can be interpreted differently by the print media engine versus the screen engine.

* Digital Forms: This is a classic failure of absolute vs. relative positioning. When you overlay HTML form fields on a scanned PDF background (a common requirement), the HTML box model's flow layout simply cannot guarantee pixel-perfect alignment with the fixed grid of the underlying image. I've seen teams resort to printing, using white out, and hand filling forms because the software couldn't align (x, y) coordinates.

* Tickets & Passes: Scanner rejection due to incorrect sizing is often due to the browser engine's "print scaling" or "fit-to-page" logic, which can be difficult to disable and varies between environments (e.g., a local Docker container vs. an AWS Lambda function with different system fonts or libraries installed).

This always turns into a long tail of support tickets. The only truly reliable solution is to bypass the HTML/CSS rendering model entirely and build the document on a canvas with an absolute coordinate system. This means using libraries like FPDF (PHP), ReportLab (Python), or lower-level tools like iText/PDFBox (Java), where you aren't "converting" a document, you are drawing it. You place text at (x, y), draw a line from (x1, y1) to (x2, y2), and manage page breaks and object placement explicitly.

It's not cheap. The initial build cost is high because every layout is effectively a small, “programmaticd CAD project”. You can't just "throw HTML at it". But the payoff in reliability is immense. It becomes a set and forget system that produces identical documents every time, which stops the endless firefighting.

Yes, two years later it can be painful to update when the original developer is gone. But I would take that trade off any day over constantly battling with imprecise, non deterministic tools. In twenty years of building systems where documents are mission critical, "close enough" rendering was almost never good enough.

▲

aszen

1 day ago

[-]

Yeah exactly we were using fpdf heavily but now switched to Typst since its faster to iterate complex documents on.

▲

itsgabriel

1 day ago

[-]

Have you looked at something like Latex or Typst? They come with their own layout engine, so potentially less tedious work like specifying exact positions.

▲

hbcondo714

1 day ago

[-]

> book.load_url("input.html")

Shouldn’t this be a URL like https://example.com

Also, is there support for creating a linkable table of contents?

▲

iamgopal

1 day ago

[-]

Comparing it to typst ?

▲

sammycage

1 day ago

[-]

Typst and PlutoPrint serve somewhat different purposes. Typst is more like a modern typesetting language, focusing on fully programmatic document layouts with its own syntax, while PlutoPrint is a Python library built on a C++ rendering engine that converts HTML or XML into PDFs and PNGs. PlutoPrint’s strengths are fast rendering, strong SVG support, and integration with existing Python workflows, whereas Typst is great if you want a typesetting DSL with precise layout control from the ground up.

▲

richfreedman

1 day ago

[-]

Nice! I think that it would be great if this could take markdown as input, without having to convert to HTML first

▲

sammycage

1 day ago

[-]

Interesting. I will give it a try. By the way, why is converting to HTML first a problem for you?

▲

pac0

1 day ago

[-]

Does this support full flexbox styling?

What are the known issues or the unsupported css this library has?

▲

sammycage

1 day ago

[-]

PlutoPrint supports a large subset of CSS, including flexbox for most common layouts, but it’s not a full browser engine, so there are some limitations. You can see a more complete list of supported features here: https://github.com/plutoprint/plutobook/blob/main/FEATURES.m.... We’re also actively tracking bugs and improvements on the GitHub repo: https://github.com/plutoprint/plutoprint/issues, and contributions or test cases are always appreciated to help expand coverage.

▲

moelf

1 day ago

[-]

for a second I thought it's this Pluto (note)book https://plutojl.org/

▲

ge96

1 day ago

[-]

Might need this wkhtmltopdf being bound to bookworm

▲

klaxce

1 day ago

[-]

I’m also looking at this as a replacement for wkhtmltopdf as well. I had reimplemented with Puppeteer, but it’s very ram heavy for the 200-500 page PDFs I generate. I’m hoping this renders what I need properly.

▲

andrea76

1 day ago

[-]

The problem is that it's still using unsupported qt4 libraries

▲

_giorgio_

1 day ago

[-]

Hi, have you tested it with google colab notebooks?

Printing those things is really difficult. All the time I get split cells (with some rows not printed) and every kind of problems (like broken word wrap etc).

▲

sammycage

1 day ago

[-]

Hi! We haven’t specifically tested Colab notebooks. They’re tricky because of dynamic layouts and tables, but simpler notebook exports to HTML might work better. Any feedback or test cases would be super helpful to improve support.