All of the USDS work is published with "No Copyright".
The SAT filters however still do not support incremental building, which is one of bloom filters fun features when you use them in distributed databases (you can build N of them and then OR bloom filters to get a single one).
I imagine it will still be incredibly useful where you can iterate over them and do OR the old fashioned way, but at higher accuracy for the same size.
- Google scholar pointed out this link to get a pdf for one of the papers cited in the repo [1]
[0] https://en.wikipedia.org/wiki/Cuckoo_filter
[1] http://t-news.cn/Floc2018/FLoC2018-pages/proceedings_paper_4...
Why is this an obvious application? How does this application benefit from a "very efficient" first pass? Just the boarding process on an airplane takes 20-30 minutes; you can easily check the entire passenger manifest in an error-free way in much less time than that. People have to buy their tickets before the boarding process begins.
Your post is really weird to me, talking about boarding times? You start skeptical of the example & I'm confused how you think this is anything but a fine example. Ultimately there's some service running in the cloud somewhere that needs to have checks run against it. 2.9m people fly a day in the US, and whether the servers doing that work can do it efficiently or whether they do it in a dogsbit bad manner seems like an obvious concern to me? https://www.faa.gov/air_traffic/by_the_numbers
I suspect the actual usage for this is for much broader higher traffic systems. For things that watch sizable chunks of the internet for patterns and traffic. But checking passengers against. I fly lists sounds like a pretty reasonable example use to me, and the criticism seems off base & weird in a number of dimensions that straight up don't make sense.
Even if we check them at both ends, and effectively double the load, thats only ~100reqs/second. A single machine would happily handle that.
That's a strange assumption. The airports that have significant traffic are operating 24 hours.
Under the assumption that airports close between 11 and 6, there would be no such thing as a redeye flight.
To me, 47 or 37 req/s seems like a fantastically immaterial difference. It's just not a big enough change in magnitude to really affect the situation.
Accurate qualm, and being technically correct. Personally I'd try to find a more liberal minded approach when trying to hold in my mind the question for what efficient set membership might be good for.
As FridgeSeal points out, both numbers are very small, but that's not a reason you'd want to set up an inaccurate triage system on top of the accurate one. If you don't have very much work to do, you don't need to invest much in optimizing it.
It it likely true that "most airports" are not operating 24/7, but how is that relevant? It could be just as true that "most airports" don't serve commercial flights at all. The airports that have a lot of passengers are operating 24/7. We're talking about a metric assessed per passenger.
The only size problem is with non-ordered MPHF's where you need to reference the index through an index order table also.
The SAT approach is cute, but doesn't scale. It might have better runtime costs as you can spare one additional table lookup. Efficient MPHF's are miles better at construction time.
The second paper was from a conference originally and found this link to it through Google Scholar (also listed in another comment); http://t-news.cn/Floc2018/FLoC2018-pages/proceedings_paper_4...