What is Big Data?

Bill Schmarzo By Bill Schmarzo October 26, 2011

Before I continue my “Big Data Ramifications for Baseball” subject, there is a topic that I feel even more compelled to address.  I recently facilitated at EMC’s IT Leadership Council where I had the chance to mingle with a hundred-plus EMC customers. This allowed me the opportunity to poll a wide variety of industry leaders about the importance and opportunity for big data within their organizations. It became immediately obvious that everyone has a different definition of the term “Big Data” and that for the majority of folks, it mostly means large data volumes.

Gartner, Forrester, IDC, McKinsey, and many of the other soothsayers in our industry have done a solid job of articulating that “Big Data” is more than just data volume.  The big data discussion must also contemplate data velocity (i.e., the speed at which the data is being generated and consumed), data variety (i.e., different data types including semi-structured and unstructured data) and data complexity (i.e., different data types, standards, domain rules, and data locations).

Well, if that’s not confusing enough, I’m going to also tell you that “Big Data” is more than just data, and a big data discussion, strategy, and plan needs to consider many other variables.

So to help folks out (including myself), I started to keep a running list of the different dimensions of big data that popped up during my week of discussions.  Here’s that list.  Feel free to respond to this blog (or to me directly) with other dimensions that we are going to need to take under consideration as we continue these big data discussions.

“Big Data” is more than just about data volumes, as it needs to encompass data velocity, variety, and complexity.  But big data is more than just about data.  Below are some of the drivers of change that need to be considered in any “big data” discussion or envisioning exercise.

  • Detailed Structured Transactional Data:  POS transactions, call detail records, credit card transactions, shipping status updates, purchase orders, payments, shipments, account transactions
  • Unstructured Data:  Web logs/clickstream, newsfeeds, social media, geo-location, mobile, consumer comments, claims write ups, doctor’s notes, clinical studies, image analysis, video analysis, audio analysis (Shazam)
  • Machine or Device-generated Data:  RFID sensors, smart meters, smart grids, GPS spatial (Progressive Snapshot), micro-payments
  • Data Exchanges/Data Aggregators:  Financial, credit, market, geographical, weather, automotive (Polk), legal (LexisNexis), Government data (Agriculture, Commerce, Defense, Labor, Health Services)
  • Technology Drivers:  MPP architectures, columnar databases, in-database analytics, in-memory computing, NoSQL, parallel & distributed processing, advanced data visualization, mobile bi, collaboration, data mashups, search, cloud
  • Advanced Analytic Tools:  SAS, R, statistical analytics, predictive analytics, data mining, machine learning, Hadoop (HDFS, Hive, HBase)
  • Advanced Architectural Design:  Agile data warehousing, data virtualization, data fabric, high-performance data analysis (e.g., algorithmic trading), iPhone/iPad apps user experience, analytic sandboxes, experimentation, Data-as-a-Service, BI-as-a-Service

Yikes, no wonder folks are confused and really don’t know where and how to start on their big data journey!!

By the way, for those of you who happen to be in the Bay Area on the evening of October 27th, I’d encourage you to stop by for a Big Data discussion event.  The event is titled “Big Data:  Path to Profit” and is being sponsored by the ACG Silicon Valley.  Here’s a little blurb about the event and the panelists.

Big Data: Path to Profit

An exponentially increasing information wave towers above human and many machine’s capacity to process and understand. Big Data surpasses terabytes – venturing deep into the murky abyss of exabytes and zettabytes. Discover what your organization has been missing as fresh tools reveal hidden profits from buried data treasures.

Moderator:John Furrier CEO SiliconAngle


  • Dr. Partha Bhattacharya, Co-Founder, CTO  & VP Engineering, Acce|Ops
  • Sai Gundavelli, Founder & CEO, Solix Technologies

  • David Lyle, VP Product Strategy, Office of the CTO, Informatica

  • Bassel Y. Ojjeh, Co-founder & CEO, nPario, Inc.

  • Stacy M. Passeri, CEO & President, KiteTale, LLC

  • Bill Schmarzo, CTO, Enterprise Information Management, EMC Consulting

  • George Symons, Chief Strategy Officer, Solix

I also will be speaking on an EMC Live Webcast on November 2nd at 12:00pm EST. The title of the webinar is “How to Successfully Exploit Big Data for Business Advantage.” Topics covered will include how to use big data to discover business insights, how to identify where to utilize big data analytics to yield business value, how to ensure success on a big data project, and how to lead a big data journey with a solid plan in place.

Bill Schmarzo

About Bill Schmarzo

Read More

Share this Story
Join the Conversation

Our Team becomes stronger with every person who adds to the conversation. So please join the conversation. Comment on our posts and share!

Leave a Reply

Your email address will not be published. Required fields are marked *

0 thoughts on “What is Big Data?

  1. Pingback: What is Big Data?

  2. How about data pattern in its raw form, data usage pattern in its consumption form, and data flow pattern in its transimission state? Instead of trying to model the never-ending data nature changes, why not focus on shaping each transitive state toward to a pre-determined ultimate goal?

  3. Pamela, I agree with your take, and think that’s why were seeing the evolution of “schema-less” data models and why technologies like Hadoop have such appeal. Using Hadoop on the raw data, you sort of just build the most appropriate schema based upon the questions being asked. And while this will not likely replace the role of BI in most companies, it should complement your BI investments.

    Thanks for posting!!

  4. Great piece Bill. Finally good to see the industry adopting the volume-variety-velocity construct that Gartner first published 11 years ago. Unfortunately many other analyst orgs and vendors have claimed the ideas as their own. For future reference and proper attribution, here’s the original piece I wrote back in 2001 then entitled, “Three Dimensional Data Management: Controlling Volume, Velocity and Variety”: Since then we’ve recognized and written about other dimensions of Big Data as well. –Doug Laney, VP Research, Gartner. @doug_laney

  5. Hey Doug, thanks for reading and commenting on the blog! Does that mean I now have star appeal??

    I’m not surprised that you guys at META Group were the first to coin the Big Data phrase, or at least 3 of the 4 elements of big data. I always had a great deal of respect for guys like you and Mark Smith who were doing leading-edge schtuff at META Group, and that’s why I worked closely with you guys at places like Sequent and DecisionPoint in the 1990’s.

    I like the original “3D” positioning better than what we have today. You could easily put complexity into the variety classification. Plus “3D Data” sounds so much cooler than just Big Data.

    BTW, I like to share with folks my first experience with “big data,” which was in the late 1980’s when the Consumer Package Goods and Retail industries shifted from using bi-monthly Nielsen audit data to point-of-sale scanner data to run their businesses. The jump in data volumes was only part of the challenge, as the latency of data (we didn’t refer to is as velocity) and diversity of data opened up all sorts of opportunities. It enabled a whole new generation of analytics-powered applications (trade spend effectiveness, store assortment optimization, in-store merchandising, supply chain and inventory optimization, store operations, loyalty programs, etc.), built on next generation data platforms with next generation analytic tools. Yes, exciting times, and I expect that we’re going to see another generation of analytics-powered applications arise out of this current big data movement as well.

    Good times to be in the data business!!