Understanding Big Data
Part 1: Introduction - Age of Big Data
Picture the Second Coming of the Internet, we are just entering the 2nd inning.
The whole point of the discussion, and of Big Data, is how to ask the data questions. The data is only valuable if it is used to make decsions.
The key is to understand data as it is; unstructured, using a scalable platform for analysis, processing, and action, in order to unlock value
Businesses wishing to remain competitive must continuously learn how to use technology to give meaning to the data they collect
Online customers generate a lot of data (data trails) which, combined with social media can be used to add value, generating leads and sales
Effective decision-making should be based on current real-time data
Information loses its value quickly and should be used efficiently
Keeping up with demands of changing clients and conditions requires a structural solution (like real-time management and utilization)
Scale is rapidly increasing: the current internet environment is millions of users and associated data-points managed by large websites, combined with round-the-clock smartphone activity, and still growing
You can read a summary of these points in our blog post on Unlocking the Value of Data.
Definition of Big Data
Wiki tells us: “In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.”
In other words, more data than ever before is available as more people & things are connected via internet.
What Opentracker does to solve this problem
In a nutshell; we’ve built a distributed database system that will collect and store anything you throw at it. In keeping with our tradition of simplifying things, we’ve got a powerful api you can use to ask the data any questions you like. Click here for the Opentracker api.
An endless stream of data
An example would be a program to manage exercising. Sounds simple enough, but think about all the data: the individual accounts and separate datastreams; every step taken, start times, finish times, distances, average speed, calories burned, sessions, temperatures, weight, BMI calculations, milestones, etc. Now imagine that every piece of data is a single entry/ signal, every footstep shouting “count me” until there is a tremendous amount of information in a very short time. It takes a large effort to collect, store, manage, maintain and keep this data available.
Tsunami of analytics
There is talk of data scientists, map reduce, hadoop, and big data analytics.
With so many people uploading endless streams of fotos, videos, music, content, consumer choices, likes, tweets, and chatter into the cloud, it is no wonder there seems to be too much information to act on. This is sometimes referred to as a ‘Data Tsunami’ – the fact that a datastream for even a single user via social media such as Facebook, Instagram, and Twitter, networking via LinkedIn, or a consumer site such as Amazon contains innumerable pieces of information to be counted and put to use.
Some examples – Big Data in action
What industries are collecting and using this data? One example is the health care and insurance industry. They collect large amounts of data in order to derive predictive models for people, costs, treatments, and propensity for disease. The airline industry has successfully developed very complex real-time ticketing systems.
The automotive industry; receiving datatstreams from cars, navigation systems, fuel consumption, oil quality.
B2C retailers: consumption patterns, stock, ordering, returns, sales – and how all of this ties in to online advertising campaigns, conversion, and efforts; ad delivery.
Outsourcing data management - IaaS
In the past, companies kept this data themselves, everything needed was to be found in-house and many work stations were not connected to internet. Now its a requirement to have an internet connection as a resource while developing. Despite this, many companies are hesitant to outsource data management. Data management means managing and storing the data, and more importantly – being able to query it. Traditionally, data has been stored in databases and only later, if ever, consulted.
What’s happened recently though, with the onslaught of data, is that a company has to become specialized in data management just to be able to cope. That translates into infrastructure (hence the new IaaS; Infrastructure-as-a-Service). The alternative is for a company to purchase their own engineers and infrasturuture.
next: Part 2 Ownership, Terminology, & What questions to ask