Replication

Replication is when data is copied in two nodes, so they both have exact copies of the data. If one node were to go offline, the system would still have a copy of the data in the other node.

There are three strategies for replication:

Data sent to all replicas at the same time;
- Each node may apply the data to its own set in different order, so can introduce some inconsistency.
Data updates sent to an agreed upon location first;
- Master node resolves the update.
- Potentially Single Point of Failure.
- Potentially latency increased if master node slows down or has too much traffic.
Data updates sent to an arbitrary location first.
- May go out of sync or introduce latency (e.g. if update_1 sent to slow node, and then update_2 sent to fast node, which one gets applied first?).

Sharding

Sharding is distributing the load across nodes, so they can each perform a portion of the query. It is unlike replication, where each node holds a copy of the data.

Think of replication like RAID 1, and sharding as RAID 0, if we were talking about disks.

Sharding does not help redundancy

If a node goes down, we will lose that data.

So it is a good idea to combine sharding with replication. For example, the MongoDB official documentation suggests each shard be composed of a 3-node replica set. That way if one of the nodes in a shard goes down, the shard still has the data in two other nodes, so it will not lose the data.

For it to lose the data in that scenario, all three nodes would have to die at the same time. Unlikely!