Horizontally scalable, geo-distributable, strong consistency systems currently in use are increasingly complex. They either need to be maintained by world-class experts or used as a service, leading to vendor lock-in and potentially extortionate prices. They also need at least 3 replicas to work reliably in case of node failures and either need careful clock-synchronization between instances or don't scale well geographically. They're hard to debug should a problem arise. Tracing problems across multiple computers is an inherently complex task, adding that to the workload of an engineer is a big chore.
The algorithm establishes a consistency model between independent consensus groups, so nodes in the cluster only need their event logs replayed to them until a specific offset to be in the desired state. This approach means that we only need (but allow for more than-) a single instance to be reliable, and in case of a node failure, the instance only needs to be restarted for the operations to resume. By establishing a new kind of logical clock, we enable nodes closer to each other to expose changes faster, and for more distant ones to reveal them slower and in larger batches. Andras will show how this allows for a more flexible geo-distribution model.
The algorithm provides a solution to the problem through event sourcing alone, so it's not alien to developers. It also means that it's much simpler to co-deploy nodes with microservices, tying them up for a simpler developer workflow.
Download presentation