Definitions of Big Data
Q: Can you please provide me with a definition of Big Data?
A: The definition of Big Data is a moving target.
In order to make it possible to follow the discussion, as it evolves, we see have started a list of definitions, as we read them on the internet.
Author names: Andrew Brust (ZDNet), Bill Franks (FCW article), PCmag encyclopedia, John Rauser (Networkworld), John Weathington (TechRepublic Blog), Cory Janssen (Techopedia.com), Mike Gualtieri (Forrester), John Ebbert (Adexchanger), Edd Dumbill (O’Reilly Strata), Boyd & Crawford (cited by Leslie Johnston on the Library of Congress), Tim Gasper (cited on TechCrunch), Margaret Rouse (TechTarget), Mike Loukides (O’Reilly Radar), Jimmy Guterman (O’Reilly), Wikibon, Steven Burke (CRN), Urbandictionary .com, Slashdot (SAP survey), George Dyson (personal correspondence Tim O’Reilly), Doug Laney (Stackexchange), Brian Hopkins and Boris Evelson (Forrester), Bob Gourley (Smartdatacollective), SAS, Stephane Hamel.
- “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.” Cited from Wikipedia
- “Big data is the term increasingly used to describe the process of applying serious computing power – the latest in machine learning and artificial intelligence – to seriously massive and often highly complex sets of information.” Cited from 4/2013 the Microsoft Enterprise Insight Blog
- “We can safely say that Big Data is about the technologies and practice of handling data sets so large that conventional database management systems cannot handle them efficiently, and sometimes cannot handle them at all.” Cited from 1/2012 ZDNet Blog by Andrew Brust.
- “An easily scalable system of unstructured data with accompanying tools that can efficiently pull structured datasets.” Cited from a 4/2013 post on the FCW Blog.
- “The definition of big data? “Who cares? It’s what you’re doing with it,”” Cited from 3/2013 FCW article, quoting Bill Franks.
- “The definition of big data refers to groups of data that are so large and unwieldy that regular database management tools have difficulty capturing, storing, sharing and managing the information.” Cited from yourdictionary.com
- “Big Data refers to the massive amounts of data that collect over time that are difficult to analyze and handle using common database management tools. Big Data includes business transactions, e-mail messages, photos, surveillance videos and activity logs (see machine-generated data). Scientific data from sensors can reach mammoth proportions over time, and Big Data also includes unstructured text posted on the Web, such as blogs and social media.” Cited from the pcmag encyclopedia
- “Any amount of data that’s too big to be handled by one computer.” John Rauser cited 5/2012 at networkworld.com
- “To define big data in competitive terms, you must think about what it takes to compete in the business world. Big data is traditionally characterized as a rushing river: large amounts of data flowing at a rapid pace. To be competitive with customers, big data creates products which are valuable and unique. To be competitive with suppliers, big data is freely available with no obligations or constraints. To be competitive with new entrants, big data is difficult for newcomers to try. To be competitive with substitutes, big data creates products which preclude other products from satisfying the same need.” Cited from 9/2012 John Weathington on the TechRepublic Blog
- “Big data refers to a process that is used when traditional data mining and handling techniques cannot uncover the insights and meaning of the underlying data. Data that is unstructured or time sensitive or simply very large cannot be processed by relational database engines. This type of data requires a different processing approach called big data, which uses massive parallelism on readily-available hardware.” Cited from a Cory Janssen post on Techopedia.com
- “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.” Cited from IBM.com
- “A more pragmatic definition of big data must acknowledge that: Exponential data growth makes it continuously difficult to manage — store, process, and access. Data contains nonobvious information that firms can discover to improve business outcomes. Measures of data are relative; one firm’s big data is another firm’s peanut. A pragmatic definition of big data must be actionable for both IT and business professionals.The Definition Of Big Data: Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.” Cited from a 5/2012 Mike Gualtieri Forrester Blog post
- “The world has always had ‘big’ data. What makes ‘big data’ the catch phrase of 2012 is not simply about the size of the data. ‘Big data’ also refers to the size of available data for analysis, as well as the access methods and manipulation technologies to make sense of the data.” Cited from 12/2012 Adexchanger.com article by John Ebbert
- “Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.” Cited from 1/2012 post by Edd Dumbill on O’Reilly Strata
- “We define Big Data as a cultural, technological, and scholarly phenomenon that rests on the interplay of: (1) Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets. (2) Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims. (3) Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.” From Critical Questions for Big Data Boyd & Crawford (2012) as cited by Leslie Johnston on the Library of Congress website
- “The definition of Big Data is very fluid, as it is a moving target — what can be easily manipulated with common tools — and specific to the organization: what can be managed and stewarded by any one institution in its infrastructure. One researcher or organization’s concept of a large data set is small to another.” Cited from 10/2011 Leslie Johnston Library of Congress
- “Big Data is presently synonymous with technologies like Hadoop, and the “NoSQL” class of databases including Mongo (document stores) and Cassandra (key-values).” Tim Gasper cited 10/2012 on TechCrunch
- “Big data (also spelled Big Data) is a general term used to describe the voluminous amount of unstructured and semi-structured data a company creates — data that would take too much time and cost too much money to load into a relational database for analysis. Although Big data doesn’t refer to any specific quantity, the term is often used when speaking about petabytes and exabytes of data.” Cited from a 3/2011 post by Margaret Rouse on TechTarget
- “But I do like Roger Magoulas’ definition of “big data”: big data is when the size of the data becomes part of the problem.” Cited post by Mike Loukides 2/2013 post on O’Reilly Radar
- “Big Data: when the size and performance requirements for data management become significant design and decision factors for implementing a data management and analysis system. For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.” Cited from 6/2009 Jimmy Guterman O’Reilly
- Big data has the following characteristics;
- Very large, distributed aggregations of loosely structured data – often incomplete and inaccessible:
- Petabytes/exabytes of data
- Millions/billions of people
- Billions/trillions of records
- Loosely-structured and often distributed data
- Flat schemas with few complex interrelationships
- Often involving time-stamped events
- Often made up of incomplete data
- Often including connections between data elements that must be probabilistically inferred,
- Applications that involved Big-data can be:
- Transactional (e.g., Facebook, PhotoBox), or,
- Analytic (e.g., ClickFox, Merced Applications). Cited from Wikibon.org
- “We think at the end of the day, big data is not just about analytics, it is about data-centric applications. It is about driving some experience to a customer and causing them to do things in realtime.” Paul Mauritz quoted in a 4/2013 Steven Burke article on CRN
- “Modern day version of Big Brother. Online searches, store purchases, Facebook posts, Tweets or Foursquare check-ins, cell phone usage, etc. is creating a flood of data that, when organized and categorized and analyzed, reveals trends and habits about ourselves and society at large.” urbandictionary.com
- “A new survey by SAP suggests that nearly 76 percent of executives see “Big Data” as an opportunity. However, respondents’ definition of “Big Data” varied to a considerable degree. Nearly a quarter of the 154 C-suite executives felt that “Big Data” was the technologies designed to handle the massive amounts of data swamping organizations. Another 28 percent defined “Big Data” as that flood of data itself. Still another group (19 percent) equated “Big Data” with storing data for regulatory compliance. Around 18 percent viewed “Big Data” as the increase in data sources, including social networks and mobile devices.” slashdot 6/2012 citing an SAP survey
- “Big data is what happened when the cost of storing information became less than the cost of making the decision to throw it away.” Tim O’Reilly quoting personal correspondence via email from George Dyson 20 March 2013, regarding Dyson’s talk at the Long Now Foundation 19 March 2013.
Ad / Media Apps
Analytics and Visualization
Data as a service
Infrastructure As A Service
- The recently updated Gartner definition also recognizes the value aspect: “Big Data are information assets with volumes, velocities and/or variety requiring innovative forms of information processing for enhanced insight discovery, decision-making and process automation.” – Doug Laney posting at stackexchange citing his original piece outlining the 3Vs of big data now republished.Here is the original 2001 paper entitled “3-D Data Management: Controlling Data Volume, Velocity and Variety” by Laney.
- “Big data: techniques and technologies that make handling data at extreme scale economical.” by Brian Hopkins and Boris Evelson at Forrester 8/2011. In diagram form.
“To date, our key message has been that it is the enterprise CTO who is responsible for defining how the term should be used.”Bob Gourley (who originally posted the big data definition on wikipedia) posting on smartdatacollective.com 12/2012.
- “A phenomenon defined by the rapid acceleration in the expanding volume of high velocity, complex, and diverse types of data. Big Data is often defined along three dimensions — volume, velocity, and variety.” [Big Data requires] “advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information.” The TechAmerica Foundation’s Federal Big Data Commission Comprehensive Guide to Best Practices for Big Data, cited by Bob Gourley here 10/2012.
“Big data is a popular term used to describe the exponential growth, availability and use of information, both structured and unstructured.Ultimately, regardless of the factors involved, we believe that the term big data is relative; it applies (per Gartner’s assessment) whenever an organization’s ability to handle, store and analyze data exceeds its current capacity.” SAS.
- The simplest definition of “Big Data” is “it doesn’t fit in Excel” from the full quote; “I have joked that the simplest definition of “Big Data” is “it doesn’t fit in Excel” – and when you think of it, it’s true for most people who wonder how to make the shift from a traditional approach to a Big Data one.” Stephane Hamel comment 8/2012 Big Data – What It Means For The Digital Analyst.
- More to follow…