Fine-grained caching strategies of dynamic queries
76 points
11 months ago
| 4 comments
| jensrantil.github.io
| HN
m104
11 months ago
[-]
One aspect of this type of problem I missed from the article is whether the data mutations were applied evenly across transaction time. Data sets like these tend to be very active for recent transactions, while the updates fall off quickly as the data ages. If that's the case, applying a single query caching solution may not be a good fit and may always suffer from major tuning/balance issues.

If the data is in fact updated with clear hot/warm/cold sets, caching the cold sets should be extremely effective, the warm set moderately effective, and it may not even be worth caching the hot set at all, given the complexity proposed. Additionally, you should be able to offload the cold sets to persistent blob storage, away from your main database, and bulk load them as needed.

Finally, it can be faster and simpler to keep track of deltas to cold sets (late mutations that happen to "invalidate" the previously immutable data), by simply storing those updates in a separate table, loading the cold set data, and applying the delta corrections in code as an overlay when queried. Cron jobs can read those deltas, and fold them back into the cold set aggregations, making clean validated cold set data again.

Great article, BTW! There are entire database technologies and product dedicated to addressing these use cases, particularly as the data sets grow very large.

reply
macca321
11 months ago
[-]
About 15 years ago I implemented "cache namespacing" for memcached, where you build a final cache key for a stored item (e.g. "profile_page") by doing an initial multiget cache query for all the "namespace" version values (e.g "user_123", "team_456" might be needed for "profile_page"), which you combine together as a prefix for the final cache key.

You can then invalidate any final cache key that uses one of the namespaces by incrementing the namespace key.

I haven't come across this technique mentioned elsewhere since, but it's very useful.

See the namespaces section in the now 404ing memcached FAQ https://web.archive.org/web/20090227062915/http://code.googl...

I guess nosql, edge caching and materialised views make it less applicable than it used to be (when inelastically scaling single/replicated SQL instances were the only game in town and taking load off them was vital).

Or is this technique now a first class feature of various cache client SDKs?

reply
anonymoushn
11 months ago
[-]
Edge caching often has to support purge-by-tag which ends up working similarly.
reply
NickM
11 months ago
[-]
This example is a great use case for partial incremental view maintenance systems like ReadySet: you automatically get something like the “prepopulating the cache” section (toward the end of the blog) while only caching the data the application is using, and avoiding the need to manually implement any sort of invalidation logic.

(Disclaimer: I used to work for them, but don’t anymore. It’s all available for free on GitHub though for anyone interested: https://github.com/readysettech/readyset)

reply
bawolff
11 months ago
[-]
Seems kind of like sonething CouchDB would be good at.
reply