In April 2012, VisibleTechnologies.com (a social media monitoring company) published a 1,211% increase in use of the term “Big Data,” from March 2011 to March 2012 in a survey of English Social Media Channels. Big Data is certainly one of the key buzzwords of our time.
In a 2001 METAGroup article, Doug Laney presented the three “V”s of Big Data: Volume, Velocity and Variety. Others have added multiple fourth “V”s such as Vulnerability, Veracity and Value. None of these contributes to the fundamental definition. They are consequential.
When people think of Big Data they often focus on the first “V”, volume; after all, it is called Big Data; but, large data volumes are nothing new. Data has always been “big” relative to the technology to make use of it.
The original Big Data was the Library at Alexandria, which contained the combined experiences and learnings of ten centuries. In 1944, the concern was that American University libraries were doubling in size every 16 years and that the number of published volumes would outpace the ability to physically store them, let alone access and derive value from them.
Data has always been big, but never nearly as massive as it is today. For over a decade, we have heard about the early pioneers of this generation’s big data: Wal*Mart, Google, eBay, Amazon, the Human Genome Project, and the new trailblazers such as Internet giants Facebook, Twitter, eHarmony, and comScore. Additionally there are ubiquitous sensor based data generators in Hospital Intensive Care Units, Radio Frequency IDs tracking products and assets, GPS systems, smart meters, factory production lines, satellites and meteorology, the list continues to grow.
Market research firm IDC estimated that 1,800 exabytes of data would be generated in the year 2011. An exabyte is a unit of information equal to one quintillion (1018 bytes), or one billion gigabytes. Estimates report that the world produced 14.7 exabytes of new data in 2008, triple the amount generated in 2003. Cisco systems estimates that by 2016, annual Internet traffic will create 1.3 Zettabytes (1021 bytes), or one trillion gigabytes. To put that in perspective, all the internet traffic in the years 1984 to 2012 has generated a total 1.2 Zettabytes. We will soon be generating in one year what has taken 26 years to accumulate.
The focus on the size attribute of Big Data is understandable in the face of these statistics, and stems from limitations in the technology available at the time, to acquire, process, and deliver these large volumes of data in a reasonable amount of time to make that data meaningful to the decision makers in the business. Traditional Relational Database technologies and methods of loading, storing and retrieving data were incapable of keeping pace with the speed necessary to analyze and act on the data.
With the advent of new storage and query technologies such as Hadoop, MapR, Cloudera, Teradata Aster, IBM Neteeza, NoSql, NuoDb, MongoDB, CouchDB, HBase, etc., volume becomes the least important of the three “V”s.
Volume alone does not define Big Data. Big Data is more about the second and third “V”s, Velocity and Variety. Part two of this Big Data series will delve into the Velocity factor.