> This paper presents Amazon MemoryDB, a fast and durable inmemory storage cloud-based service. A core design behind MemoryDB is to decouple durability from the in-memory execution engine by leveraging an internal AWS transaction log service. In doing so, MemoryDB is able to separate consistency and durability concerns away from the engine allowing to independently scale performance and availability. To achieve that, a key challenge was ensuring strong consistency across all failure modes while maintaining the performance and full compatibility with Redis. MemoryDB solves this by intercepting the Redis replication stream, redirecting it to the transaction log, and converting it into synchronous replication. MemoryDB built a leadership mechanism atop the transaction log which enforces strong consistency. MemoryDB unlocks new capabilities for customers that do not want to trade consistency or performance while using Redis API, one of the most popular data stores of the past decade.
[0] https://assets.amazon.science/e0/1b/ba6c28034babbc1b18f54aa8...
> MemoryDB solves this by intercepting the Redis replication stream, redirecting it to the transaction log, and converting it into synchronous replication
Replication is eventually consistent in Redis - is it saying that it’s intercepting the stream at the source and blocking the write from completing until replication completes? Cause intercepting it at the point it’s going out (which is what the word interception implies to me) wouldn’t get you strong consistency I would think.
"Due to our choice of using passive replication, mutations are executed on a primary node before being committed into the trans- action log. If a commit fails, for example due to network isolation, the change must not be acknowledged and must not become visible. Other database engines use isolation mechanisms like Multi-Version Concurrency Control (MVCC) to achieve this, but Redis data struc- tures do not support this functionality, and it cannot be readily decoupled from the database engine itself. Instead, MemoryDB adds a layer of client blocking. After a client sends a mutation, the reply from the mutation operation is stored in a tracker until the transaction log acknowledges persistence and only then sent to the client. Meanwhile, the Redis workloop can process other operations. Non-mutating operations can be executed immediately but must consult the tracker to determine if their results must also be delayed until a particular log write completes. Hazards are detected at the key level. If the value or data-structure in a key has been modified by an operation which is not yet persisted, the responses to read operations on that key are delayed until all data in that response is persisted. Replica nodes do not require blocking as mutations are only visible once committed to three AZs"
What are the consistency and durability properties of the tracker datastore?
Are replies from tracker mutations stored in a tracker-tracker until the tracker-transaction-log acknowledges persistence?
Is it trackers all the way down?
I also think that the decoupled design is kind of elegant, it allows the logical implementation to be developed independently of the durability bits. It’s open-core but someone else is building the core.
If not, then your proposed design seems pretty different from MemoryDB; yours doesn't persist data in the event of machine loss or reboot.
For instance: it's hard to scale concurrent writes with SQLite. I read they have an enterprise paid version with higher write concurrency support, but have no idea how it works and whether it'd compare with Redis or MemoryDB's write concurrency levels.
so thats 99.(eleven 9s) ?
where would this sort of database used? streaming financial instrument ticks? Do you point Kinesis and its able to write/read super quickly?
It'd be more interesting if they talked about what log they used (Kinesis? Something on another DB?), what did they use for a locking service and how did they handle failure cases, etc.
Why is the community better off without this option?
BTW AWS can't possibly have been delivering on 11 9's given their previous outages; 9 9's is 0.031s over a year.
You mean four nines:
> AWS will use commercially reasonable efforts to make Amazon EC2 available for each AWS region with a Monthly Uptime Percentage of at least 99.99%, in each case during any monthly billing cycle
You're welcome to make your substantive points thoughtfully of course.
This said, it's not bad. I'd keep in mind that the paper is one thing and putting your money where your mouth is means having an SLA for latency. So far, Google's BigTable is the only service with a read latency SLA.
- 99.99 availability
- 3ms p50 - 6ms P99 read-only! latency
SLAs just mean "if you can prove that we didn't meet our SLA, we'll give you a refund, and by refund we mean some % of your bill for some duration".
It's not nothing - it's obviously $$s, and so teams get measured and have goals about improving their availability and latency.
But most customers don't seek out those refunds, and so there is no real pressure connected between the SLA and their true performance (which is often much BETTER than the SLAs, but not because of the SLAs)
I'm not sure I understand this. Regardless of the refund, if a provider cuts enough corners with SLAs, won't the customers eventually raise a stink about it and make use of the (thankfully robust) competition? Plus there's the support overhead of tracking the performance and issuing the refund, which might exceed the cost of the refund itself.
I think to most people an SLA is an indicator that a company is serious about this aspect of the product's performance. Serious enough to write it into long term contracts and align its incentives to fulfill it.