Replication
Replication is when data is copied in two nodes, so they both have exact copies of the data. If one node were to go offline, the system would still have a copy of the data in the other node.
There are three strategies for replication:
- Data sent to all replicas at the same time;
- Each node may apply the data to its own set in different order, so can introduce some inconsistency.
- Data updates sent to an agreed upon location first;
- Master node resolves the update.
- Potentially Single Point of Failure.
- Potentially latency increased if master node slows down or has too much traffic.
- Data updates sent to an arbitrary location first.
- May go out of sync or introduce latency (e.g. if
update_1
sent to slow node, and thenupdate_2
sent to fast node, which one gets applied first?).
- May go out of sync or introduce latency (e.g. if
Sharding
Sharding is distributing the load across nodes, so they can each perform a portion of the query. It is unlike replication, where each node holds a copy of the data.
Think of replication like RAID 1, and sharding as RAID 0, if we were talking about disks.
Sharding does not help redundancy
If a node goes down, we will lose that data.
So it is a good idea to combine sharding with replication. For example, the MongoDB official documentation suggests each shard be composed of a 3-node replica set. That way if one of the nodes in a shard goes down, the shard still has the data in two other nodes, so it will not lose the data.
For it to lose the data in that scenario, all three nodes would have to die at the same time. Unlikely!