What is Big Data?
Before I continue my “Big Data Ramifications for Baseball” subject, there is a topic that I feel even more compelled to address. I recently facilitated at EMC’s IT Leadership Council where I had the chance to mingle with a hundred-plus EMC customers. This allowed me the opportunity to poll a wide variety of industry leaders about the importance and opportunity for big data within their organizations. It became immediately obvious that everyone has a different definition of the term “Big Data” and that for the majority of folks, it mostly means large data volumes.
Gartner, Forrester, IDC, McKinsey, and many of the other soothsayers in our industry have done a solid job of articulating that “Big Data” is more than just data volume. The big data discussion must also contemplate data velocity (i.e., the speed at which the data is being generated and consumed), data variety (i.e., different data types including semi-structured and unstructured data) and data complexity (i.e., different data types, standards, domain rules, and data locations).
Well, if that’s not confusing enough, I’m going to also tell you that “Big Data” is more than just data, and a big data discussion, strategy, and plan needs to consider many other variables.
So to help folks out (including myself), I started to keep a running list of the different dimensions of big data that popped up during my week of discussions. Here’s that list. Feel free to respond to this blog (or to me directly) with other dimensions that we are going to need to take under consideration as we continue these big data discussions.
“Big Data” is more than just about data volumes, as it needs to encompass data velocity, variety, and complexity. But big data is more than just about data. Below are some of the drivers of change that need to be considered in any “big data” discussion or envisioning exercise.
- Detailed Structured Transactional Data: POS transactions, call detail records, credit card transactions, shipping status updates, purchase orders, payments, shipments, account transactions
- Unstructured Data: Web logs/clickstream, newsfeeds, social media, geo-location, mobile, consumer comments, claims write ups, doctor’s notes, clinical studies, image analysis, video analysis, audio analysis (Shazam)
- Machine or Device-generated Data: RFID sensors, smart meters, smart grids, GPS spatial (Progressive Snapshot), micro-payments
- Data Exchanges/Data Aggregators: Financial, credit, market, geographical, weather, automotive (Polk), legal (LexisNexis), Government data (Agriculture, Commerce, Defense, Labor, Health Services)
- Technology Drivers: MPP architectures, columnar databases, in-database analytics, in-memory computing, NoSQL, parallel & distributed processing, advanced data visualization, mobile bi, collaboration, data mashups, search, cloud
- Advanced Analytic Tools: SAS, R, statistical analytics, predictive analytics, data mining, machine learning, Hadoop (HDFS, Hive, HBase)
- Advanced Architectural Design: Agile data warehousing, data virtualization, data fabric, high-performance data analysis (e.g., algorithmic trading), iPhone/iPad apps user experience, analytic sandboxes, experimentation, Data-as-a-Service, BI-as-a-Service
Yikes, no wonder folks are confused and really don’t know where and how to start on their big data journey!!
By the way, for those of you who happen to be in the Bay Area on the evening of October 27th, I’d encourage you to stop by for a Big Data discussion event. The event is titled “Big Data: Path to Profit” and is being sponsored by the ACG Silicon Valley. Here’s a little blurb about the event and the panelists.
Big Data: Path to Profit
An exponentially increasing information wave towers above human and many machine’s capacity to process and understand. Big Data surpasses terabytes – venturing deep into the murky abyss of exabytes and zettabytes. Discover what your organization has been missing as fresh tools reveal hidden profits from buried data treasures.
Moderator: John Furrier CEO SiliconAngle
- Dr. Partha Bhattacharya, Co-Founder, CTO & VP Engineering, Acce|Ops
- Sai Gundavelli, Founder & CEO, Solix Technologies
- David Lyle, VP Product Strategy, Office of the CTO, Informatica
- Bassel Y. Ojjeh, Co-founder & CEO, nPario, Inc.
- Stacy M. Passeri, CEO & President, KiteTale, LLC
- Bill Schmarzo, CTO, Enterprise Information Management, EMC Consulting
- George Symons, Chief Strategy Officer, Solix
I also will be speaking on an EMC Live Webcast on November 2nd at 12:00pm EST. The title of the webinar is “How to Successfully Exploit Big Data for Business Advantage.” Topics covered will include how to use big data to discover business insights, how to identify where to utilize big data analytics to yield business value, how to ensure success on a big data project, and how to lead a big data journey with a solid plan in place.