Replication and Consistency in Cassandra
A great presentation on this subject available here.
Replication is copying data so instead of only on one node, exact copies of the data are in multiple nodes. The purpose of doing this is so that if one node goes down, the data is still available in other nodes.
Let's call our replication factor N, and say N = 3
.
We can also set the Consistency Level, which is for a read or write, how many nodes must respond with a 'success' status before the read or write is committed to the journal.
Let's call the Consistency Level W for writes or R for reads. If R = 2
, then data must come from 2 nodes and be equal, or else it cannot be read. Similarly for writes.
Consistency Level Options
For writes, Consistency Level Options may be one of the following:
Level | Description |
---|---|
ZERO | Cross Fingers |
ANY | 1st Response (including HH) |
ONE | 1st Response |
QUORUM | N/2 + 1 replicas must respond |
ALL | All replicas must respond |
For reads, Consistency Level Options may be one of the following:
Level | Description |
---|---|
ONE | 1st Response |
QUORUM | N/2 + 1 replicas must respond, takes most recent one |
ALL | All replicas must respond, takes most recent one |
Read Repair
Official definition here.
Read repair means that when a query is made against a given key, we perform a digest query against all the replicas of the key and push the most recent version to any out-of-date replicas. If a lower ConsistencyLevel than ALL was specified, this is done in the background after returning the data from the closest replica to the client; otherwise, it is done before returning the data.
Remember that if the Consistency Level is ALL, this is done before returning the data (slower but more consistent data), otherwise it is done in the background.
Network replication strategies
A replication strategy determines the nodes where replicas are placed.
Please read the official documentation for this.
There are two strategies:
- SimpleStrategy; and
- NetworkTopologyStrategy.
SimpleStrategy
Use only for a single data center.
SimpleStrategy
places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or data center location).
NetworkTopologyStrategy
Use
NetworkTopologyStrategy
when you have (or plan to have) your cluster deployed across multiple data centers. This strategy specify how many replicas you want in each data center.
Snitches
Snitches maps nodes (IPs) to physical locations (data centres and racks).
They inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into data centers and racks.