It got me thinking it'd be cool to track this somehow, so I built a website! I am taking a sidewalk livestream, feeding it into a YOLO model for people tracking, then sending a frame of each detected person to Gemini 2.0 Flash, which returns structured JSON about each person's clothing and if they're holding an umbrella. I also had fun making the site look like a TV weather channel.
I showed some friends this project and someone mentioned how the legendary Tasks xkcd comic (https://xkcd.com/1425) is out of date now. If you want to check whether a photo has birds in it (or if someone is holding an umbrella), you can just ask an inexpensive vision model for JSON.
1vLIazn0SQog6ol/6SZBv3hrHT7ADILN78/4KaCM5ShI1E2OQViqjnfCZtJGZCvmZC843+OuzgKMQ9+whTW1XoVxN0cn6XQTCIDTkxjFXZmNvAi65B73bie4ir1kJ5CcQ2Kikxr286a0IhNGZSBvx5+BQYBeXb265b9YxCiboARXdUb81fFV5kRFGUroYKtdSVaiDLiWeryTijWmOFtL2Q==
https://encode-decode.com/encryption-functions/ (`aes-128-cbc`)
Secret hint: three word catch phrase of the shop, all lower case
For a while there I thought about recording cars driving in my quiet street, by colour, make, try to categorise them... Never got around to setting up the length needed for the camera cable and a good weather proofing solution.
I didn't realise you could get Gemini to respond that fast. We live in the science-fiction times.
What's amusing to me is that if a mugger was going to mug someone in front of the camera, your system would happily report what they're wearing, blissfully ignorant of the situation.
Consider open sourcing this mangled up solution!
I now luckily have a window that looks out into the city and I use what other people are wearing as an indicator for what to wear that day. Definitely helpful on the marginal days where it's maybe shorts, maybe pants, maybe light jacket, maybe sweater-weather. Temperature/wind/humidity tell most of the story, but there's cloud cover, wind direction, morning-to-night temperature swings, etc, that make the decision a bit more iffy.
Cool project! I may need to look into doing something similar.
What privacy standard is being broken here?
I thought I'd let you know that the web page is rotated to the right on my desktop monitor so I have to tilt my head or drag the browser window to the landscape-oriented monitor. I'm guessing that this is an optimisation for mobile devices but I doubt I'm the only one with a portrait-oriented monitor (viewport dimensions are currently 1200×1779).
The song: https://www.youtube.com/watch?v=_whvVXX0hCk A translation of the lyrics: https://lyricstranslate.com/en/coton-ouate-sweater.html
What would happen if someone geolocates your camera and just plants a bunch of umbrellas in the frame? Does the counter require the umbrella to be held by a human? What if the same person walks past the camera multiple times? Are they considered unique counts, or are you recognizing people and logging that?
TL;DR how robust is your system against mischievousness?