FilterHN

Show HN: Unsiloed AI – #1 on olmOCR-Bench

5 points

1 hour ago

| 2 comments

Most of the document parsers fail on real world challenges like complex tables, handwritten documents, historical document scans, equations, multi-column layouts, complex reading order, etc. We built Unsiloed Parser to handle exactly these cases.

Our latest parser v3.1 achieved #1 rank and scored 88.0 strict pass-rate on olmOCR-Bench. We ran the evaluation across 1,403 PDFs and 8,413 unit tests using the unmodified upstream Allen AI scorer (olmocr==0.4.27) and found Unsiloed beats 18 other OCR services, including GPT-5.5, Claude Opus 4.7, LlamaParse, Reducto, Azure Document Intelligence, AWS Textract, and Unstructured.

When we dug deeper into the failure cases, we found many errors were not OCR errors but things like \frac vs \dfrac, whitespace differences, or equivalent LaTeX renderings. We ran a secondary LLM-as-Judge evaluation to classify real misses vs semantic equivalents, which lifts the corrected score to 94.8 (explained deeply in the blog post).

Blog with full methodology and examples: https://www.unsiloed.ai/blog/unsiloed-ai-achieves-1-rank-on-...

Evaluation Code for reproducibility: https://github.com/Unsiloed-AI/unsiloed-olmocr-benchmark

Feel free to post your messiest PDFs in the comment and we'll run it through Unsiloed parser and share the output here.

▲

pshishodia

7 minutes ago

[-]

Damn unreal that OCR was so unsolved until now

▲

adnan9999

1 hour ago

[-]

Founder here. If you've got a notorious PDF you would like us to try , pls feel free to drop it in the comments. We'll run it and share the output here.

▲

warthog

51 minutes ago

[-]

website has no self serve sign up

▲

adnan9999

45 minutes ago

[-]

Yeah, we're not fully self-serve yet. Shoot me an email at adnan@unsiloed.ai with some info on the use case and I'll get you set up on the platform.