Big Data Strategies
How can we handle Big Data, its volume and its velocity?
Big Data has some basic properties:
- It's raw (you can answer more questions from raw data);
- It's not cleaned (don't want to lose any data, even if we think it is not useful); and
- It (should be) immutable--after all, data corruption is the cause of most problems in a database.
Timestamps and Small Tables
Instead of this:
id | name | gender | status | town |
---|---|---|---|---|
1 | Jose | M | Pending Enroll | Dundee |
2 | Yago | M | Enrolled | Dundee |
3 | Stuart | M | Not Enrolled | Broughty Ferry |
4 | Helen | F | Pending Acceptance | London |
We would have this:
id | name | timestamp |
---|---|---|
1 | Jose | 1449180525 |
2 | Yago | 1448185325 |
3 | Stuart | 1431297337 |
4 | Helen | 1429171731 |
id | gender | timestamp |
---|---|---|
1 | M | 1449180525 |
2 | M | 1448185325 |
3 | M | 1431297337 |
4 | F | 1429171731 |
id | status | timestamp |
---|---|---|
1 | Pending Enroll | 1449180525 |
2 | Enrolled | 1448185325 |
3 | Not Enrolled | 1431297337 |
4 | Pending Acceptance | 1429171731 |
id | town | timestamp |
---|---|---|
1 | Dundee | 1449180525 |
2 | Dundee | 1448185325 |
3 | Broughty Ferry | 1431297337 |
4 | London | 1429171731 |
So now changing any of the values for any of the ID's is substantially easier.
If we want to add more data to one of the ID's, we can do that.
If we want to remove a field from one of the people, we can just remove the row for e.g. "town"
.
Add and Never Delete
Your data sets will get bigger. But never delete a fact! Even if it is outdated. You can construct a state of affairs at any point by keeping a history.
Instead of this:
Person | Friend Count |
---|---|
Jose | 25 |
Yago | 8794 |
Do this:
Person | Friend Action | Timestamp |
---|---|---|
Jose | +1 | 1449180525 |
Jose | +1 | 1449180520 |
Jose | +1 | 1449180515 |
Jose | +1 | 1449180423 |
Jose | +1 | 1449151335 |
Jose | -1 | 1449131751 |
Jose | +1 | 1449051881 |
Jose | +1 | 1447481823 |
Jose | -1 | 1443945446 |
That way, it is much easier to check how many friends Jose
had at timestamp 1449051881
. You could go further, and save which friend was added at what time.