Start your free, no-risk, 4 week trial!

Big Data Orientation


Ownership of data

This is especially an issue for larger, more traditional companies, for whom data is something that is always kept on-site. Using IaaS (Infrastructure-as-a-service) means letting somebody else host, store, and manage your data, so that they can generate reports. It is possible to resolve this in different ways, although it’s probably more cost-efficient to let an IaaS manage the server park. Data ownership can be stipulated in a contract, so that while Company A owns their data, Company B hosts and manages the data. Access to the source code of the software used to manage the data can also be an issue. This can be resolved by placing the code in escrow or using open source software.


Some helpful terminology:

3V's: volume, velocity and variety.

3V's: volume (lots of gigabytes or terabytes), velocity (coming in quickly, faster than normal methonds can handle, coming in too quickly to process), and variety (unstructured – multiple sources too many for pre-defined tables). For example, in terms of variety, Opentracker collects urls, countries, ip info, user-tagging, conversion, tech specs, OS, and custom events, meaning any piece of information that can be defined and sent). This information comes from a variety of sources: location databases, carriers, browsers, device specs, cookies, javascript, social networks, networking sites, user profiles, company databases, etc.

4S's: source, size, speed, and structure.

IaaS: Infrastructure-as-a-service. An Infrastructure-as-a-Service provider typically delivers a combination of hosting, hardware, provisioning and basic services needed to operate a cloud. To quote wiki (again): In the most basic cloud-service model, providers of IaaS offer computers - physical or (more often) virtual machines - and other resources... Cloud providers typically bill IaaS services on a utility computing basis: cost reflects the amount of resources allocated and consumed.


Why are we talking about this &
what questions should I be asking?

The point of all this data is to enable decision-making. That means that an answer is needed in time to make a decision, not two weeks later. What's interesting is that there is a trend from looking over our shoulders towards looking ahead >> we used to study data from the past >> real-time processing >> predictive analysis.
So it's a bit like a camera which starts by panning backwards towards where we came from, then swings around to show the runner, and finally swings to show the path disappearing towards the horizon.

From a corporate perspective, you may hear questions like these:

Am I going to set up data-processing infrastructure?
Do I want to own the data?
Do I want to put data in the cloud? (hint: Answer: if it absolutely needs to stay PRIVATE don’t put it in the cloud)
How do I get the data into my own company’s infrastructure?

There are a few household names that have become experts at handling large flows of data; Twitter, Google, Facebook, Amazon. They’ve solved the big data problem for their own organizations. The next step is how to make those solutions available to all the other enterprises that exist today.


Where is all the data coming from?

It used to be coming (only) from people. Now its coming from both people and machines (sensors). For example the temperature of a room, which if processed, comes from a sensor in a room, but there is no person sending a signal.

At the moment, data is being generated by people, through clicks, swipes, and taps. Increasingly in the future, servers and sensors (non-people) will be generating the data. Smartphones, android, iphone, self-service ticketing machines, card transactions etc, almost everything we do and touch is generating data.

This data is increasingly unstructured, but still needs to be processed. The fact that it is unstructured is what led Charles Fan to name it crap (create, replicate, append, process) – when referring to the mind-boggling amount of data which is created / stored / generated and often left for roadkill. Why left for roadkill? Because the amount of know-how and resources needed to derive meaningful conclusions from all the data collected is prohibitive.

This leads to the great challenge we are facing: not just to collect and store all the data coming in, but to organize it. Nobody cares about deleting it or updating it (hence the crap description), and so whoever designs a new data center for crap will be the winner.


Conclusion: and the winner is...

So the goal is to give enterprises access to data storage & management. Enterprises, which tend to be larger and more traditional organizations, require flexibility in terms of infrastructure location, they may not want their data in the cloud. 
From the point of view of companies who provide Big Data solutions, the winner will be the one who structures their service in a way that is accessible for more traditional organizations.

From the point of view of business and retail, the winners, simply put, will be the businesses and companies who learn how to apply Big Data (read: innovative data analysis, or connecting the dots) analysis techniques to the data they possess about their clients, or market conditions.



Start your free, no-risk, 4 week trial!