Pipeline: select area and zoom level, split the region into mercantile tiles, run each tile with the prompt through a VLM, convert predicted bounding boxes to geographic coordinates (WGS84), and render the results back on the map.
It works reasonably well for distinct structures in a zero-shot setting. occluded objects are still better handled by specialized detectors like YOLO models.
There is a public demo and no login required. I am mainly interested in feedback on detection quality, performance tradeoffs between VLMs and specialized detectors, and potential real-world use cases.
The AI struggles a bit with less generic terms. It correctly realised Radcliffe Camera was a building, but tagged another building as well and guessed wrong for Balliol Library (I guess the models haven't seen it from above). On the other hand I was pleased it tagged narrowboats and didn't tag them as fish when I asked it to find fish on that tile...
Once I figured out how to use the UI I did 2 scans. first one I had to zoom in before the identification boxes popped up. At first I thought it didnt do anything
Second scan I put over a local aviation museum with a mix of helicopters, unusual planes, cars, buildings, and other equipment. I was surprised to see everything identified correctly, though it missed a single helicopter.
I'd love a little bell or notification when the scan completes, as I hit 'scan', switch to a different tab and then forgot I was waiting
https://github.com/nabetse00/webnova_submision/blob/main/Pyt...
And track I didn't indicate frequency, it's not per minute, but say hourly.
https://www.planet.com/pulse/illuminate-the-dark-fleet-with-...
Cool concept though.