Tables vs Big Data
Business data is often tabular and atomic, but doesn't necessarily have to be relational.
But what is atomic data?
Is a Date
field atomic? If it is a count of seconds, yes. If it is an ISO date, then possibly not. For example, the date 2015-10-08 15:36:11
can be split into a date and a time separately.
In some instances, it may not make sense to split the data, but what is important to remember is that not all data is truly atomic data.
Real-world data can be split, such as images, videos, music, and machine data.
What is structured data?
Is music unstructured? Is a picture unstructured?
Is anything unstructured?
We could argue that all data is structured data, but that all depends on who (or what) is interpreting the data. Lets listen to the experts in this one.
So to conclude, data can be one of these types:
Structured
Relational databases:
- ACID (Atomicity, Consistency, Isolation, Durability);
- Referential integrity;
- Strong type; and
- Schema support.
Semi-structured
Self-describing data:
- Those that include metadata (e.g. netCDF and HDF5); and
- XML data defined by a schema.
Quasi-structured
Web clickstream data, with some inconsistencies in data values and format.
Unstructured
- Text documents susceptible to text analytics; and
- Images and Video.
What is Big Data then?
From http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/, we can get an overview of what Big Data is.
So, really, give me a definition
From Wikipedia:
"In information technology, big data consists of data sets that grow so large, they become awkward to work with using on-hand database management tools."
Remember that any definition of Big Data that uses static sizes (e.g. 1 Petabyte) become obsolete quickly as data storage capabilities grow.
The Three V's
Instead of defining Big Data as "awkwardness", which is essentially what Wikipedia does above, we can define in terms of the 3 V's, a concept that Doug Laney originally came up with in 2001.
Volume, Variety, and Velocity.
Gartner's original post about the 3 V's
Is there more than 3 V's? Could we add in V's like:
- Viability;
- Value;
- Veracity?
Some would say the original 3 V's as envisioned by Doug Laney (from Gartner) are still the only V's (read article here).
Here's a good infographic about the 3 V's that suggests the V "validity", possibly incorrectly.