I had a look for this and it turns out it's slightly mis-described there - it's not a window counter, it's a "GCRA (Generic Cell Rate Algorithm)" - a leaky bucket algorithm. Code here: https://github.com/redis/redis/blob/unstable/src/gcra.c
The code comments say it was heavily influenced by https://github.com/brandur/redis-cell by Brandur Leach.
It's a neat algorithm (I just learned about it today) - it only needs to store a single integer for each rate-limited key, which is the "Theoretical Arrival Time" when the bucket would next be empty.
One, it would be cool to be able to embed it, similar to sqlite, directly into applications.
Two, the HA story is so much more complicated than it should be. I totally acknowledge that concurrency and distributed computing is hard, but it should not require reading heaps of documentation and understanding two entirely separate multi-node approaches only to figure out there are lots of subtle strings attached that make it impractical for many applications.
I'm asking as a non-webdev who never quite got what Redis actually does, but would love to learn.
It would also be useful because of the ability to switch modalities. When running a multi node service, you can use Redis to share data between nodes and use Redis pubsub as a communication bus. If you wanted to support a simple single node configuration too, then it wouldn't need to be a special case, it could just go through the same mechanism but with an embedded Redis instance.
It's pretty similar to SQLite: being able to embed more or less a complete storage engine into your app can be very convenient and powerful.
If you use multiple nodes, then you probably want your redis lifecycle not be tied to application lifecycle.
The entire value of redis IMO is that is ISN'T inside your normal application, but rather some shared storage that all nodes can use to coordinate and that survives deploys, but that provides more ergonomic data structures than SQL databases. Caches are only one type of such shared data, but things like feature flags, circuit breakers and rate limiters are also super common (and super useful).
However, Mnesia seems like it is quite a bit more of a complete distributed database engine than Redis. To me the nicest thing about Redis is just the convenience of what it offers: very fast data structures, serialized, optimized (at least by default) for cases where speed is more important than durability. It is simple on many levels and somewhat constrained in scope. Mnesia seems to be aiming more generally in the distributed database category.
So how do you feel they compare?
It's similar in effect to what busybox does to shell utilities, though the motives are different.
- HyperLogLog, bloom filter, other probabilistic data structures
- Geospatial operations on stored points and polygons
- Expiring keys, for creating caches
These aren't in most standard libraries, and the Redis implementations tend to be fast, robust and well understood.
Every language you can talk to redis most likely has a library to do that, and it probably works much better with the rest of application than "embedded redis". If it doesn't, it probably has C-FFI and there is "fast, robust and well understood" implementations in C.
(I'm not personally sold on embedded Redis myself, but the question was "Aren’t your own programming language’s constructs much more well-defined / understood?")
Embedding would make local dev/CI integration testing convenient.
Embedding replicated Redis with each application instance would give you HA benefits while infra-management complexity.
Embedded redis (even via local RPC) is still going to be faster than a lot of languages or frameworks’ built-in data structures. Large array operations in, say, Python are gonna slower than RPCing to Redis (assuming that the data structures are built gradually and not built all at once); to beat Redis you’d have to use numpy or something—-which is definitely preferable, but is extra work if your app already uses Redis for other things.
Just like choosing SQLite over e.g. LMDB or RocksDB, embedded Redis would be a nice future proofing option for small apps during the prototype phase; less would have to be changed to move Redis out of the app than if a different cache or persistence service were chosen.
A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary.
it's not a relational database.
It’s the same use case with a different api.
A typical (meaningful) example might be communication between threads or actors in a single process, or idempotent tests.
As with SQLite, an external xxx that does this for you is certainly better, etc. but it’s convenient sometimes, to have an application that doesn’t go “now before you run this install Postgres…”.
It’s seldom useful for a web app where you control everything.
I've found myself wanting this on several occasions too. I.e. wanting all my rust backend processes (k8s pods) to have some minimal shared state, without having to spin up a Redis cluster. I've talked to Claude about it a couple of times, and it descends into something like, "you gotta use Raft or CRDTs, and pick 2 out of 3 from CAP". Which honestly seems pretty fair, and indicates to me that I'm dreaming for something magical.
Nonetheless, it is nice to hear someone else asking for this. If this is indeed feasible (even if simple/limited), then I'd be interested to try it.
Spinning up an in-memory (no persistence) Redis cluster in your k8s should be easy enough, hopefully?
If you haven't come across Kvrocks yet, it may be worth a look: https://github.com/apache/kvrocks https://kvrocks.apache.org/ . It's a database with a Redis-compatible wire protocol, but the database is stored on disk. This means your working set is not limited by RAM and can be a few orders of magnitude larger! On modern SSDs this is still very fast. I think it improves the durability story as well. But the big win is the orders of magnitude larger database space.
As I've been improving my side project https://totalrealreturns.com/ recently I've ended up using both Redis and Kvrocks together. Redis is great for small global state that needs to be super fast. Kvrocks is great for larger bulk data storage (large precomputed datasets), but also supports a lot of the Redis data structures as well as Lua scripts.
For example if you use it for session storage, you can't have your application read from a random instance that may or may not contain the session.
Now you could solve this specific case by sharding by prefix, or by querying all instances, but then you still do not have high availability: if the instance a specific session is on is down, these users cannot authenticate. At that point you’re better off with a single instance.
You, obviously, don't commit important data only to a session that you can loose, if the application does not allow it.
We use redis as infrastructure. To route events and as a cache.
For us redis could go down and we would merely see a degradation of our service with no data loss.
I recommend using redis like that. And then use a database that supports transactions for real data problems.
But we are different. And that's OK.
But that requires running on multiple instances, which in turn requires to share the data across all replicas.
Just because it works for your use case right now doesn’t mean there isn’t room for improvements to support others too.
Oh good, then you don't need to do any of the stuff that you suggested to do
The app would look up in both databases. If it exists in any, there would be a session.
Thisnis strictly different from partitioning which I think you are mixing it up with.
Paritioning is for performance not HA
And if you find the session with differing values in both databases, how do you know which one is up-to-date?
You need an algorithm to pick which data is right, such as electing a master instance.
And that brings us back to the original discussion: to manage sessions (unlike caches) in a highly available way, you need to setup HA (or reimplement it, which obviously is a bad idea). You can't read round robin from multiple non-HA instances.
There is a whole slew of downstream things you need to take into consideration.
Will millions of users, high availability is critical for this functionality.
This is entirely different than what Redis is and tries to solve.
Sqlite is embedded. It's not a distributed SQL. Redis is a distributed data structure store and concurrency primitive. These are worlds apart.
> HA story is so much more complicated than it should be
It is precisely as complicated as it needs to be. You don't want data loss.
If you're in the business of high available fault tolerance, you read the manual and learn how to Redis.
A high availability protocol should not leak into the client. It should be able to discover other nodes. It should not land in broken states so easily. It should not limit the number of writers. It should not error during failover.
Are these hard problems? Yes. Should we just accept that things are hard because that’s how the gods have given them to us? No.
But most of the cloud providers now offer Valkey because of the license changes. Of course, cloud providers not offering Redis was the intention of the license change from the Redis point of view. So mission accomplished for Redis.
But the flip side of course is that if you want to deploy on standard infrastructure rather than self hosting Redis, Valkey is now the easy, low risk path that probably should be the default for most companies that target AWS, Azure, GCP, etc. Same with Elasticsearch vs. Opensearch and a few other products where the community forked because of license changes.
Mentioning Elasticsearch because I know people in both communities and I'm deeply familiar with the stack. A few years on, Opensearch has taken a lot of the momentum from Elasticsearch.
https://aws.amazon.com/blogs/database/reduce-your-amazon-ela...
I feel like we're using about 1% of its features at this point - really just as a fast K/V store - so it would be easy to switch if needed, but I can't see a case where we would.
So we’ve stayed with Valkey.
https://redis.io/blog/diving-deep-into-rediss-new-array-data...
This is awesome!
And arrays look great too. Lots to play with.
The website looks like openclaw's website.