Whenever any progress is made, this is the logical conclusion. And yet, those who decide about how your time is being used, have an opposing view.
IMO, cyber security, for example, will have to become a government mandate with real penalties for non-compliance (like seat belts in cars were mandated) in order to force organizations to slow down, and make sure systems are built carefully and as correctly as possible to protect data.
This is in conflict with the hurtling pace of garbage in/garbage out AI generated stuff we see today.
Same here, for diaries/journals written in mixed Swedish/English/Spanish and with absolutely terrible hand-writing.
I'd love for the day where the writing is on the wall for handwriting recognition, which is something I bet on when I started with my journals, but seems that day has yet to come. I'm eager to get there though so I can archive all of it!
They're very good at it.
Ideally something that I can train with my own handwriting. I had a look at Tesseract, wondering if there’s anything better out there.
Historical handwriting, Gemini 3 is the only one which gave a decent result on a 19th century minutes from a town court in Northern Norway (Danish gothic handwriting with bleed through). I'm not 100% sure it's correct, but that's because it's so dang hard to read it to verify it. At least I see it gets many names, dates and locations right.
I've been waiting a long time for this.
Please share. I am out of the loop and my searches have not pointed me to the state of the art, which has seen major steps forward in the past 3 or 4 years but most of it seems to be closed or attached to larger AI products.
Is it even still called OCR?
Personally I found magistral-small-2509 to be overall most accurate, but it completely fails on some samples, while qwen3-vl-30b doesn't struggle at all with those same samples. So seems training data is really uneven depending on what exactly you're trying to OCR.
And the trade-off of course is that these are LLMs so not exactly lightweight nor fast on consumer hardware, but at least with the approach of using multiple you greatly increase the accuracy.
Am I nuts or is this wrong, not “perfect”?
It doesn’t look crossed out at all to me in the image, just some bleeding?
Still very impressive, of course
https://g.co/gemini/share/e173d18d1d80
This is a random image from Twitter with no transcript or English translation provided, so it's not going to be in the training data.
The result from Gemini 3 Pro using the default media resolution (the medium one): "(Заголовок / Header): Арсеньев (Фамилия / Surname - likely "Arsenyev")
Состояние удовл-
t N, кожные
покровы чистые,
[л/у не увел.]
В зеве умерен. [умеренная]
гипер. [гиперемия]
В легких дыха-
ние жесткое, хрипов
нет. Тоны серд-
[ца] [ритм]ичные.
Живот мяг-
кий, б/б [безболезненный].
мочеисп. [мочеиспускание] своб. [свободное]
Ds: ОРЗ [или ОРВИ]" and with the translation: "Arsenyev
Condition satisfactory.
Temp normal, skin coverings [skin] are clean, lymph nodes not enlarged.
In the throat [pharynx], moderate hyperemia [redness].
In the lungs, breathing is rigid [hard], no rales [crackles/wheezing].
Heart tones are rhythmic.
Abdomen is soft, painless.
Urination is free [unhindered].
Diagnosis: ARD (Acute Respiratory Disease)."Transkribus got a new model architecture around the corner and the results look impressive. Not only for trivial cases like text, but also for table structures and layouting.
Best of all, you can train it on your own corpus of text to support obscure languages and handwriting systems.
Really looking forward to it.
Ah, maybe I'll pick up Qin seal when I retire, if I retire.
Hopefully next generations will feel the same about legal contracts, law in general, and Java code bases. They're incomprehensible not because of fonts but because of unfathomable complexity.
Not a chance, sorry.