The network is reliable

I recently ran across an interesting series of posts by Kyle Kingsbury about Jepson, a framework for testing the behaviour of distributed databases in the face of a network partition.

The initial set starts with the wonderfully-titled Call me maybe: Carly Rae Jepsen and the perils of network partitions, and details the behaviour of Postgres, Redis, MongoDB, and — one that was new to me — Riak; later posts cover Zookeeper and Cassandra, among others.

If you’re already familiar with distributed databases, the CAP theorem, and CRDTs, I’d suggest you go over there and enjoy. However, if (like me) you aren’t up-to-date on distributed storage theory, a little background material follows.

Consistency, Availability, Partitioning: pick two

First up, the CAP theorem. This was first described by Eric Brewer, and was formalised in a 2002 paper called Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. In summary, the paper proves that it is impossible to provide any kind of distributed read-write storage that offers all three of the following properties:

Consistency: (referred to as “atomicity” in the paper): reads and writes to distributed storage return the same results as a linear ordering of those operations.
Availability: all requests to a non-failing node will eventually complete.
Partition tolerance: tolerance to the presence of (total or partial) network partitions, in which messages between nodes are dropped.

The basic proof is fairly straightforward: Consider a distributed system that becomes partitioned. If a write succeeds against one side of the partition, a read against the other side cannot return the updated data: it can either complete and return stale data (no consistency), or not complete (no availability).

In practice, unless you’re running a single-node system, partitions are inevitable, so you get to pick one of consistency (“CP”) or availability (“AP”).

For more detail, see Henry Robinson’s FAQ and Eric Brewer’s excellent article, CAP Twelve Years Later: How the “Rules” Have Changed.

CRDwut

The other term you might not be familiar with is “CRDT”. CRDT stands for “Commutative [or Convergent¹] Replicated Data Type”, and provides a way to side-step the “consistency” requirement described above.

CRDTs were formalised in the 2009 paper “CRDTs: Consistency without concurrency control”. In summary: if you can define a data type where write operations are equivalent under commutativity, then it becomes less important that different clients agree upon the order of those operations.

A trivial example would be an (unbounded, signed) counter with “increment” and “decrement” operations, where no matter what order the operations are expressed in, the result will be the same. The original paper also describes “Treedoc”, an implementation of an ordered set with commutative insert-at-position and delete operations.

The CRDT idea appears to be similar to Operational Transformation (OT) in that writes are expressed as operations that can be combined out-of-order, except that OT does not actually guarantee that all operations are commutative.

While CRDTs aren’t universally applicable, they allow you to effectively layer eventual consistency over a distributed storage system that provides availability but not linearisability.

Coda

The title of this post is taken from Peter Deutsch’s Eight fallacies of distributed computing.

The original paper talks only about “Commutative” data types; a later paper, A comprehensive study of Convergent and Commutative Replicated Data Types, introduces a differentiation between Commutative (operation-based, “CmRDT”) and Convergent (state-based, “CvRDT”) types². ↩
I’m still not entirely sure I understand what a CvRDT is, or how it works, so I’m going to largely pretend that it doesn’t exist. Fortunately, the second paper also proves that any CvRDT can be emulated with a CmRDT, and vice versa, so I don’t think I’m missing much. ↩