Next Stop: End of the Line for Human-Scale Data
Big Data is often defined as having the characteristics of volume, velocity, and variety. These 3 V’s have become the shorthand for understanding the differences between business-as-usual and the next generation of data management practices.
But perhaps there is an even more fundamental way to describe the shift to a Big Data world.
Big Data means the end of the line for human-scale data.
Explosive Growth of Data
- In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day.
- In 2011, 493 U.S. electric utilities had 37,290,374 advanced (“smart”) meters installed.
- Smart meters in the U.S. alone now generate more than 1 billion data points per day.
- An estimated 87 million Smart TVs were sold in 2013.
- Current estimates are that by 2020, the 9 billion devices already connected to the Internet will swell to 24 billion devices.
- The new IPV6 standard for IP addresses allows for 4.8×1028 addresses for each of the seven billion people alive in 2011.
- Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data.
Seriously? Each person could have 480,000,000,000,000,000,000,000,000,000 devices attached to the Internet?
Add to this the massive amounts of data generated via web behaviors (see insert) and it is not hyperbolic to say that companies and individuals are drowning in data.
The question becomes, how will we humans keep up?
The news is not all bad. Advances in machine-scale data processing tools can give the enterprise a fighting chance. On the supply side, tools like GlobalIDs and Splunk must be utilized for discovery, pattern mining, and contextual mapping. Meanwhile, demand-side tools such as Attivio, Chorus, and Tableau can help us with federated queries, self-service provisioning, and visualization. Still, unless we re-align our data management practices…
Business-as-usual data management practices cannot scale to meet the growth of data.
Enterprises must leverage crowd-sourcing, micro-stewardship, and gamification to achieve automation, contextual awareness, and democratization of data at scale.
Human intelligence must transition its role from executor to guide/ aligner. Work must be crowd-sourced from a broader labor base – possibly even from outside the enterprise.
Intelligent automation must take over rote stewardship tasks (assess quality, raise anomalies) first, then more sophisticated responsibilities (constructing data definitions) later. Governance & Stewardship tasks must be codified and factored to be as granular as possible (micro-stewardship) so as to derive maximum value from human intuition and creativity.
The user interface must evolve to meet the growing demand for access to all available information, zero learning curve, and predictive answers.
In a world where all questions are valid, Big Data volume, velocity, and variety necessitate that our data management practices and toolsets must move from being artisanal to industrial.