More examples: - https://www.getanchorgrid.com/developer/docs/endpoints/drawi...
Main website: - https://www.getanchorgrid.com/developer
Why we did it: https://www.getanchorgrid.com/developer/docs/changelog/const...
I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]
It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.
Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
The generalization problem you're pointing at is real and it's the hardest part of this. Our approach is to keep the detection scope tight — rather than trying to generalize across every firm's conventions, we train on a small but high-quality set of fixtures and optimize for precision within that scope.
The result is high confidence outputs on the elements we support, rather than mediocre coverage across everything.
We're expanding the detection surface incrementally as we validate accuracy division by division!
The world in which metadata is a common thing attached to any file doesn't exist, and probably never will, no matter how much you try to improve CAD work flow.
I have to make a BOM and oh boy I hate my job
A lot of them are "archival" so I'm pretty OOL
It is telling that so many of the comments here assume the person with a thing that is not the most practical would be easily able to request thing in a different format. The assumption that the person with the inconvenient thing would never have thought to ask if more convenient thing was available and just willfully toiling with the inconvenient thing is kind of insulting.
Also do doors, windows, and mechanical equipment.
dm, and I can include you in the next preview.
Let me know if you find it useful or have any questions, happy to help.
Love to give it to an arc client, not sure who the right person to implement this would be? Hmm…
https://cal.com/anchorgrid/anchorgrid-external-meeting?durat...
There's nothing about PDFs or image formats that prevent anyone from doing OCR. The reason construction documents are difficult to OCR is because OCR models are not well trained for them, and they're very technical documents where small details are significant. It doesn't have anything to do with the file format