Also, the privacy and the finding are in direct opposition to each-other, which isn’t always a comfortable system dynamic.
On the one hand, a simple hash of ESSIDs near you, if you take all of them, is highly unlikely to ever match - radio strength varies and you’ll see stuff on the edges of your device pop in and out if you look at radio traces. So you need to limit the list.
However, if you limit the list, to say 3 ESSIDs, you’re well into rainbow table attack territory - if there are 100mm WiFi access points in the world, then you need 100mm^3 hashes - doable - and if you rough geoloc them first so that you’re not hashing out stuff more than a mile away from other APs, you’re down to “very manageable”.
At the same time, the question of “which 3” means that it’s going to be hard to ever get the same list, or at least you’re not in the one 9s territory of loc matching.
To do this without some sort of either trusted server or some sort of group key sharing (and therefore a totally different threat model) you’ll need to get some sort of location-aware hashing together, and I think also you’ll want to be able to get some sort of data from the local APs that’s not easily accessible elsewhere. Not sure what that is off the top of my head, but I bet there’s something in the WPS spec that you could hang off of.
So if you had the ability to be like “my hash puts me somewhere within this square (area) and only those of us here know that the secret salt for this minute is XXX” then I think you’d get back to the original goals of the project.
I bet that’s doable! Looking forward to v2
I thought about geo largely because it radically changes the order of magnitude of work necessary; it lets you segment ‘possible’ subsets of APs down to sets of say 100, not millions, and changes the combinatorics. A side effect is knowing a rough spatial location.
Off the top of my head, I don’t think that epochs alone make a big difference. If I want to see if you’ve been somewhere, or tell you I’m somewhere, why not take the 3-4 networks you mentioned, and forward hash them for the next million epochs?
Or, more ambitiously, why not take 3-4 networks each from the geo indexed clusters available at https://wigle.net/ and do the forward and backward epochs, letting me track where you’ve been and pretend to be near you any time in the future?
Wigle reports 1.7bn networks; a rough look at a suburban street near me shows most places have 10 in a reasonable range boundary; so call it 200mm “locations” with 128 segmented hashes, 250 billion hashes per epoch — I think we’re in the “seconds per epoch” range for a reasonable compute heavy server to cover the entire space.
Upshot - I think the salting needs to be something local / not predictable or stored remotely.
Hopefully these comments hit you right - I like the idea a lot - and I don’t fully understand the system - but as I understand it, the system does not offer privacy — I could replay any phone’s hashes against a system that cost a few dollars to reconstruct your location and time, if my understanding is correct.
Edit: Maybe some preshared group hash (kinda beats the point), or combining multiple modalities (eg bluetooth, shared interests) or some kind of proof of work token could help mitigate some of these issues. I guess anything to reduce the time to attack helps in this case? Or anything that really pins down environment + time, like what smath described in his comment. In essence, the core idea of minhash + lsh works and it doesn't limit you to just wifi networks. The key is being able to grab a fingerprint that is unique enough and different enough each epoch. Wifi networks are just easy enough to grab vs something more low level like an APs beacon timing interval jitter or something.
I think this is the issue, is these datasets are out there and at least big tech companies have them since they're used to assist with GPS. I was about to post the same thing as above but saw vessenes beat me to it.
Without thinking about it too hard, the two directions I see are either making observations of the environment in real-time that is only relevant at that time (IE sniffing actual wireless frames, even if they're encrypted and making observations on them, however, most devices won't let you go into promiscuous mode and do this) or encrypting the messages in flight so only participants can decrypt them (IE a model like the signal protocol with E2E message encryption).
Anyways, this is a cool approach, but that risk occurred to me as well about the ability to just brute force the entire dataset to decode every location.
(disclaimer: I am co-inventor at a previous employer, I don't get royalties for it, just reporting)
It opens you up to legal risk for knowingly infringing patents. If possible you never should look at a patent.
GPS does not work at all. I've always thought using the WiFi access points that have been installed on the underground could be a great addition to something like Citymapper to figure out where you are located even when there is no GPS.
I thought "environmental fingerprints" were referring to something more elaborate, like a fingerprint of the local audio environment, or using the accelerometer to measure the local spacetime curvature.