When memory was measured in kilobytes: The art of efficient vision
155 points
2 months ago
| 4 comments
| softwareheritage.org
| HN
cyberax
2 months ago
[-]
One approach that blew my mind was the use of FFT to recognize objects.

FFT has this property that object orientation or location doesn't matter. As long as you have the signature of an object, you can recognize it anywhere!

reply
changoplatanero
2 months ago
[-]
I believe orientation still matters but you’re right that position doesn’t.
reply
Legend2440
2 months ago
[-]
FFT is equivalent to convolution, which is widely used today for object recognition in CNNs.
reply
bobmcnamara
2 months ago
[-]
> FFT is equivalent to convolution

What do you mean by that? Could you give me an example?

reply
kragen
2 months ago
[-]
The FFT, composed with pointwise multiplication, composed with the inverse FFT, is equivalent to convolution. The FFT is not.
reply
timewizard
2 months ago
[-]
reply
bobmcnamara
2 months ago
[-]
That is something else entirely.
reply
timewizard
2 months ago
[-]
Then if you know what the OP meant why did you ask?
reply
Grimblewald
2 months ago
[-]
because they made a nonsensical claim that doesn't align with my (and likely their) understanding of what the FT is and does.

The FT is _NOT_ just a convolution, but under certain conditions a specific operation on FT terms is equivalent to a convolution.

reply
bobmcnamara
2 months ago
[-]
I didn't know what they meant. There are so many FFT tricks. I was hoping this was another.
reply
kmoser
2 months ago
[-]
I want to believe that however obsolete these old algorithms are today, at least some aspects of the underlying code and/or logic should prove useful to LLMs as they try to generate modern code.
reply
monkeyelite
2 months ago
[-]
The idea that ML is the only way to do computer vision is a myth.

Yes, it may not make sense to use classical algorithms to try to recognize a cat in a photo.

But there are often virtual or synthetic images which are produced by other means or sensors for which classical algorithms are applicable and efficient.

reply
sokoloff
2 months ago
[-]
I worked (as an intern) on autonomous vehicles at Daimler in 1991. My main project was the vision system, running on a network of transputer nodes programmed in Occam.

The core of the approach was “find prominent horizontal lines, which exhibit symmetry about a vertical axis, and frame-to-frame consistency”.

Finding horizontal lines was done by computing variances in value. Finding symmetry about a vertical axis was relatively easy. Ultimately, a Kalman filter worked best for frame-to-frame tracking. (We processed video in around 120x90 output from variance algorithm, which ran on a PAL video stream.)

There’s probably more computing power on a $10 ESP32 now, but I really enjoyed the experience and challenge.

This was our vehicle: https://mercedes-benz-publicarchive.com/marsClassic/en/insta...

reply
digdugdirk
2 months ago
[-]
That's awesome! What kind of hardware was needed to pull that off? And was the size of the bus any indication of the answer?
reply
godelski
2 months ago
[-]
You could even argue that ML does classical vision in addition to other stuff.

CNNs learn gabor filters. The AlexNet paper even shows this [0]

Or if you look at the work ViT built itself on, they show attention heads will also learn these fillers. [1] That's actually a big part of how ViTs work, the heads integrate this type of information

[0] https://papers.nips.cc/paper_files/paper/2012/hash/c399862d3...

[1] https://arxiv.org/abs/1911.03584

reply
thatcat
2 months ago
[-]
Any recommendations on background reading for classical CV for radar?
reply
monkeyelite
2 months ago
[-]
I don’t know anything about radar. I have a book called “machine vision” (Shmuck, Jain, Kasturi) easy undergrad level, but also very useful. It’s $6 on Amazon.
reply
ipunchghosts
2 months ago
[-]
Kasturi was my undergraduate honors advisor!
reply
monkeyelite
2 months ago
[-]
Small world! These are always just names on a book to me.
reply
thatcat
2 months ago
[-]
Awesome, thanks!
reply
sceadu
2 months ago
[-]
Don't know about radar but here's a good book on classical CV https://udlbook.github.io/cvbook/

even though I think Simon admits that most of it is obsolete after DL computer vision came about

reply
monkeyelite
2 months ago
[-]
> is obsolete after DL computer vision came about

I just don’t understand this. Why would new technology invalidate real understanding and useful computer algorithms?

reply
klodolph
2 months ago
[-]
Maybe… some of these algorithms from the 1980s struggled to do basic OCR, so they may need a lot of modification to be useful.
reply
PaulHoule
2 months ago
[-]
That whole approach of "find edges, convert to line drawing, process a line drawing" in the 1980s struggled to do anything at all.
reply
Retric
2 months ago
[-]
There was a surprising amount of useful OCR happening in the 70’s.

High error rates and significant manual rescanning can be acceptable in some applications, as long as there’s no better alternative.

reply
GuB-42
2 months ago
[-]
I find that modern OCR, audio transcription, etc... are beginning to have the opposite problem: they are too smart.

It means that they make a lot fewer mistakes, but when they do, it can be subtle. For example, if the text is "the bat escaped by the window", a dumb OCR can write "dat" instead of "bat". When you read the resulting text, you notice it and using outside clues, recover the original word. An smart OCR will notice that "dat" isn't a word and can change it for "cat", and indeed "the cat escaped by the window" is a perfectly good sentence, unfortunately, it is wrong and confusing.

reply
devilbunny
2 months ago
[-]
Thankfully, most speech misrecognition events are still obvious. I have seen this in OCR and, as you say, it is bad. There are enough mistakes in the sources; let us not compound them.
reply
taeric
2 months ago
[-]
I'm not sure I can sign on to this. In particular, this sounds kind of like an indictment of many algorithms. But, how many where there? And did any go on to give good results?

Considers, OCR was a very new field, such that a lot of the struggle was getting data into a place you could even try recognition against it. It should be no surprise that they were not able to succeed that often. It would be more surprising if they had a lot of different algorithms.

reply
alightsoul
2 months ago
[-]
Amazing. Wonder how fast it would be on a modern computer
reply
Hydration9044
2 months ago
[-]
+1, which is faster when compare to OpenCV findContours
reply
mrheosuper
2 months ago
[-]
I still deal with <128kb ram system everyday
reply
DaSHacka
2 months ago
[-]
Ah, Mac user?
reply
mrheosuper
2 months ago
[-]
more like STMicroelectronics user
reply
weareregigigas
2 months ago
[-]
I too need a coffee in the morning before I can do anyhting
reply