Website Market Position

Website Market Position

Start your free, no-risk, 4 week trial!

Understanding your website and business environment

In part one of Change Creates Growth we promised practical tips for increasing traffic in current internet conditions (characterized as Web 2.0).

To provide tips specific to your website we have developed a worksheet: a series of questions you need to ask. The answers will help determine your needs and choices.

Article navigation

Executive summary
Web 2.0 – new business models
Study your market sector
Trendwatch: why do big companies buy small companies?
Staying competitive

Executive summary: in a nutshell

The goal of this article is to stimulate growth and change on your website.

In Web 2.0 spirit we compiled a list of the best resources we found on subjects covered here. The resource list is located in the worksheet linked below.

You will also find a list of links to websites that provide insight to these issues.
Click here to proceed directly to the worksheet.

General introduction: Your internet market sector

Study statistics for your sector of the internet. An analysis of your environment will help you plan, for example where to advertise :: go where people are. Ask why people are going where they go and determine if there is overlap with what your website does. Incorporate information about your environment into your strategy. How does your website fit in your market sector? How do you stack up to the competition?

Example. Do you have a yoga studio or sell yoga products? Are there social networking sites or forums about yoga? Where do players in your field advertise? The way people exchange and obtain information is evolving. Evolve with it.

Industry trends: Why do big companies buy the interesting new sites?

Why should you be interested in where people go and what they do there?

Why do large companies such as Google, Microsoft and Yahoo buy the popular networking and social sites? These companies are interested in where people are going in order to control what happens there, ie advertising and content management, to be close to where the action is. Statistics and tracking are used in order to understand and influence what is happening. These companies want to sell ads; they need traffic in order to do this. Advertising is central to all these activities and in determining acquisitions. The big players must control traffic in order to fulfill business goals.

How Can Small Companies Compete In This Environment?

Use your understanding of the environment to make decisions – where to advertise, how to participate, how to find overlap between your website mission and current trends. One current trend is social networking sites – facebook, myspace, linkedIn, last.fm, etc. What are people doing on these sites and how can webmasters strategise and react to the new ways people are using the internet?

Bottom line: How can information about what is changing be useful to you? Research and learn where and how traffic moves, for free (Alexa) or paid (Hitwise). Large companies use this information to make decisions. Information about what is working and what is not working is important to the industry.

Are you located in the best environment? Your environment will determine your success. Note that Quality remains very important. See, for example, recent interviews with Google CEO Eric Schmidt who emphasizes that Google are still seeing higher conversion rates and larger revenue when focusing on quality, as opposed to quantity. It is still important to bring traffic to your site, however it is equally if not more important to draw these visitors from the right part of the internet.

Ask questions: answers will provide direction

Click here for the worksheet Website Marketing Questions with related links.

Identify and track your visitors

Tracking vs log analyzers

Tracking vs log analyzers

Start your free, no-risk, 4 week trial!

Summarized overview

In this article you will find technical definitions of:

  • Unique visitor tracking
  • Log analysis
  • Human events

You will also find information about:

  • The difference between unique visitor tracking and log analysis
  • Why log analyzers show higher numbers
  • The difference between browser events and server events
  • Tracking unique visitors from behind corporate firewalls and ISPs
  • Advantages of using cookies to track unique visitors
  • Measuring page views instead of hits
  • Tracking spiders and bots
  • Opentracker specialization in human events and unique visitor behavior

Human events versus server activity

Why do tracking services show a lower number of visitors than statistics recorded by log analyzers? The answer lies in the difference between unique visitor tracking and log analyzing. Log analyzers record all measurable activity, whereas tracking services distinguish between human activity and server activity.

Tracking service stats will show lower numbers than log analyzer stats. This is not because tracking services record fewer visitors. The reason is that tracking services are stricter in their definitions of a visitor. A tracking service should do its best to ensure that no visitor is recorded twice, and that only human clicks are counted as visits.

The reason that tracking services will report lower traffic numbers than log files is because good tracking services do not recognize the following factors as unique visits or human events:

  • repeat unique visitors (after 24 hours)
  • hits
  • robot and spider traffic
  • rotating IP numbers (i.e. AOL)

Equally important is the ability to distinguish how many unique visitors are visiting from either:

  • the same ISP (Earthlink, At&t, Comcast, Cox, etc.)
  • corporate firewalls, large organizations (Microsoft, IBM, Apple, etc.)

Otherwise all these users will be counted as the same visitor. This is a differentiation which can only be made by tracking cookies.

Where possible, tracking systems should only measure human events.

For years now, the standard measurement of website traffic on the internet has been ‘hits’. Hits are not a reliable indicator of website traffic. A hit is a single request from a browser to a server. When a visitor looks at a single page, many hits can be generated, both for the request itself, and for each component of the page.

Opentracker measures page views, not hits

Opentracker tracks page views. A page view is a single human event. A page view is also known as an impression. Each impression, or page view, represents an actual person who has viewed a specific web page. In this way, Opentracker differentiates between human events, and server-browser dialogues.

Opentracker specializes in human events and visitor behavior.

Opentracker tracks visitors over the long-term, and has the ability to recognize if a visitor has been at a site before. Opentracker uses browser cookies to track unique visitors over long periods of time. Examining a unique visitor’s clickstream, for example, can tell you how quickly new users adjust to site layout.

Drawbacks: we can miss visitors, in the event that a visitor clicks too quickly, i.e. does not wait for a page to load.

Case study

Example of a discrepancy between Opentracker and log analyzer

Log analyzers do not distinguish between humans, and spiders or bots. Spiders and bots are the devices sent out by search engines to scour and document all pages on the internet. This means that a log analyzer might record an extra several hundred visits for a given period, depending on the popularity of your site. The more popular that your site is, the more often it will be visited by search engine robots. This is especially true if your content is frequently updated.

Identify and track your visitors

Using Statistics for Website Management

Using Statistics for Website Management

Start your free, no-risk, 4 week trial!

How to get the most out of Opentracker

The question raised an interesting point: the purpose of a website tracking system. The point of using a visitor tracking and website stats program is to aid in the process of website management and decision-making. A good web metrics program should help you to manage and make decisions on a daily basis.

Q: “So I’ve added my 4 week trial to our web store pages now how do I get the most out of it? Any help would be greatly appreciated.”

A: To get the most out of your trial, and out of a site statistics and tracking program, you need to ask questions. The answers will provide you with a plan of action for your website management and help you to make decisions. The process of asking and answering questions is valuable and informative.

Ask yourself some basic questions about your website. The 3 most important answers will tell you:

  • who your visitors are
  • how they find you
  • what they are doing on your site

The answers will help you to make sure that your visitors find what they want when they arrive. Most importantly, the answers will tell you how to manage your content. Accountability and Decision-Making in Website Management

Since the ‘bubble burst’, IT budgets have been shrinking. It is no longer acceptable to simply put up a website and leave it there. Daily management is necessary. As the old maxim prescribes: ‘prevention is better than cure’. In this case it is better to utilize visitor tracking to anticipate traffic trends and be pro-active, instead of just reacting.

Accountability is important, as measuring results and effectiveness has become commonplace. “Show me the money” has become a part of webmaster vocabulary. The point is that decision-making needs to occur on a daily basis. Decisions need to be based upon what is known, not upon guessing. Updating your site, search engine optimization, and implementing a search engine marketing strategy are continuous activities that require micro-management.

Equally important is accountability from your Pay-Per-Click (PPC) advertising campaigns. Make sure that you are getting what you pay for. If your advertising campaign does not work, look at your data and change your management strategy.

Important questions visitor tracking can answer

Below, we have drawn up a list of what we feel to be 10 important questions that you can answer using your website statistics. The answers will help you to make website content management decisions.

  1. Where do your visitors come from, where do they go, what do they look at, and what pages do they exit from? What do the clickstreams tell you?
  2. Where do people start and stop viewing, where do they lose interest?
  3. How sticky is your site? Overall site and page stickiness is important. Do you have problematic leaks or drop-off points?
  4. In other words, how good is your site navigability and usability?
  5. Which referrers and PPC advertising campaigns are the most effective? Are you paying for ineffective traffic? Are you using the best search terms possible? Determine which advertising campaigns are most effective and concentrate your investment strategy there.
  6. What are your conversion and retention rates?
  7. When you make changes to your site, what is the effect? If you are trying to get people’s attention, it is good to know how they react.
  8. What is the best time of week to start an email campaign or newsletter.
  9. Are your visitors looking for something you don’t sell? Perhaps you should consider selling it!
  10. Are your customers as satisfied as they can be?

Use your statistics for site management and decision-making

Your statistics are not only numbers. They are numbers that should help you to make actual site management decisions. Make changes to your website where necessary based on what your visitors are doing. If everybody leaves from one page, ask yourself why. If visitors are not surfing to the page you want them to see, make better links.

In conclusion, and to repeat the essential point of this article: the best way to make use of the data is to ask yourself questions about your statistics reports and to answer them. This process itself is valuable because it will tell you what information is being collected and lead you to ask important questions about

a) why the information is collected, and
b) what to do with it

To orient yourself to Opentracker, we recommend taking a look at the documentation.

Identify and track your visitors

Clickstream or clickpath analysis

Clickstream or clickpath analysis

Start your free, no-risk, 4 week trial!

Summarized overview

In this article you will find discussion and technical definitions of:

  • Clickstream analysis
  • Interactive clickstream graphing

And information about:

  • What a clickstream will tell you
  • How to use clickstream analysis to improve your site
  • Why analyze clickstreams
  • Questions that clickstream analysis can answer
  • Opentracker clickstream tool
  • Tracking individual clickstreams

What is clickstream analysis?

Clickstreams, also known as clickpaths, are the route that visitors choose when clicking or navigating through a site.

A clickstream is a list of all the pages viewed by a visitor, presented in the order the pages were viewed, also defined as the ‘succession of mouse clicks’ that each visitor makes.

A clickstream will show you when and where a person came in to a site, all the pages viewed, the time spent on each page, and when and where they left.

Taken all together, as aggregated statistics, clickstream info will tell you, on average, how long people spend on your site, and how often they return. It will also tell you which pages are the most frequently viewed.

An interactive clickstream is a graphic representation of a clickstream; a list of pages seen in the order in which they were visited. The graphic allows you to click on the pages, and see what the visitor saw, hence the label ‘interactive’.

The most obvious reason for examining clickstreams is to extract specific information about what people are doing on your site. Examining individual clickstreams will give you the information you need to make content-related decisions without guessing.

There is a wealth of information to be analyzed, you can examine visitor clickstreams in conjunction with any of the information provided by a good stats program: visit durations, search terms, ISPs, countries, browsers, etc. The process will give you insight into what your visitors are thinking.

Examples of questions answered by clickstream analysis:

Q: What are people who enter my site with specific search terms doing when they get there?

A: Clickstream analysis will answer this question, and give you the opportunity to identify the search terms that are the most valuable for your site, by actually telling you how they perform. For example, if you sell widgets, and notice that a lot of people type in ‘blue widgets’ but leave without buying any, then you need to figure out why. Clickstream analysis will tell you where they come in, what they look at, and where they leave. It is up to you to figure out why they leave (also known as ‘shopping cart abandonment’). Maybe it’s because you don’t sell the particular model of widget they are after, which you may be able to see from the search term they entered.

You might also find, for example, notice that the visitors who are leaving all have screen resolutions of 800 x 600. Therefore, if you re-design your product display page, your visitors will be able to see the product pictures more easily.

Q: Why is my site not giving me the results I expect?

A: Perhaps you have a newsletter, and you would like your visitors to sign up, but nobody is signing up. Clickstream analysis will allow you to re-enact visitor click-streams. This ability to see exactly what your visitors see, and the order in which they see it, is great way to trouble shoot. You might notice for example, that most visitors only spend a few seconds on the newsletter sign-up page, or the page before it. The information that nobody spends any time on a page tells you that an update is necessary. It tells you whether or not you are including the correct amount of information on your pages. This is a crucial aspect of any website.

Opentracker’s clickstream analysis tool.

Our clickstream analysis gives you access to visitor clickstreams live, in real-time, while they are happening. If you login to our demo now, you will be able to see your own clickstream through our site (as you read this!) by going to visitors online.

We have built an interactive tool that lets you see all the visitors on your site in real-time, those both online and offline. Every visitor is represented by an icon. If you click on any visitor’s icon, you will see a graphic representation of their clickstream. You will also see that visitor’s profile, which consists of their country of origin, their ISP, technical specs, the frequency of visits they have made to your site, the search engine and search terms that they might have used. You will also know if they are a first-time visitor, and view the details of their visit, i.e. the times they entered and left.

From all this information, it is possible to extrapolate any number of conclusions and understandings of what visitors are doing.

Glossary:
Clickstream

Identify and track your visitors

Buying Traffic – PPC Ad campaigns

Buying Traffic – PPC Ad campaigns

Start your free, no-risk, 4 week trial!

Perhaps not a surprise, but our research has led us to believe that PPC is the “safest” way to get targeted traffic, as you have a high level of control and accountability.

Part 2 PPC Pay-per-Click ad campaigns

This is the revolution created by the internet – that you only pay for visitors who actually click through to your site. This remains a very attractive way of generating leads, due to the accountability factor. You build your own campaign, target your visitors with keywords, and budget yourself by bidding on keywords. We started with a wide range of campaigns spread out over a range of companies; findwhat, kanoodle, epilot, enhance, looksmart, etc., and ended up narrowing our focus to one or two well-honed campaigns. We are now back with the few sources that have consistently brought us well targeted traffic, i.e. Google, & Yahoo or Bing. We have written about this subject in another article located here.

Our site statistics are open to the public, so you can login and take a look for yourself.

The strategy we use is to identify successful keywords, using our statistics reports to show which words bring us the most visitors, and what these visitors do. We bid on these words, and drop the lesser-performing words. Google and Overture have well-built administration systems that allow you to manage campaigns. Both systems allow you to customize the URLs, meaning that you can see exactly who comes in on each individual advertisement. Both systems also show you:

a) how many times each specific keyword ad was clicked, and
b) how many times each ad was seen (impressions)

These are the two most important variables.

We would recommend that you periodically review your campaigns. In other words, don’t just set them up and forget about them. They need regular updating. The two part strategy involves using the PPC campaign to get traffic, and a statistics or tracking program to see what the traffic does.

next: Part 3 Purchasing bulk leads / traffic / clicks
previous: Part 1 Buying Traffic – paid newsletter inclusion

Identify and track your visitors

Paid Adwords & free Google traffic

Paid Adwords & free Google traffic

Start your free, no-risk, 4 week trial!

Executive Summary and Article Navigation

Everyday your site gets traffic from Google. The problem is how to tell whether the visits are paid adwords clicks or referred from free organic results.

Here are some sample questions:

“How can we tell if google traffic is adwords paid traffic or free organic?”

“Is there a way to split google adwords from google organic traffic and see which visitors clicked on which ads?”

“We would like to separate our statistics to see which customers come from adwords and which from organic results.”

Why measure Google Adwords versus Google organic traffic?

Why would you want to measure adwords against organic traffic?
Because part of the traffic is paid, and part is free. A comparison is necessary, to see how the paid traffic is performing, and if its worth paying for. Are you paying too much, should you invest more money in paid traffic?
Will it really matter if you reduce your budget by $1000 for a month? Keep track of budgets by calculating how they perform.

The quality of traffic is also important. Is there quality difference in paid vs. free traffic?
The quality of paid traffic can be influenced by ad text and search terms. Additional control is available in the form of bids and pricing.
For further reading see article on Choosing Search Terms.

How can I see the difference between paid & organic traffic?

The question is: How much should I be paying for paid traffic?
The answer is that you can see the difference between paid and organic traffic with Opentracker.

With Conversion and ROI reporting all your traffic is automatically sorted by referrer. This is done using URLs.

In Google Adwords when you create an ad you should fill in a Destination URL.
By looking at this Destination URL Opentracker can automatically identify whether the click was paid or organic.

Traffic sources automatically filtered

The conversion sources are automated – a lot of paid and organic traffic is already filtered in Opentracker automatically. We call referrers “sources” because they are traffic sources.
if you would like ROI information per goal, you can add this by creating goals.

Click here to read about seeing All Your Online Advertising in One Report.

Here is an article explaining Conversion and ROI reporting.

Goal measurement – who came to what page

Goal measurement: we can generate a report for any page on your website that shows you all traffic sources per page and the total return-on-investment for any traffic conversion goal.

How do we do this? We combine several pieces of information. If you use Destination URLs we get the cost per click from the URL. Define the pages you want to be conversion goals in the report. Place a line of code in each goal page and Opentracker will do the rest.

How do visitors from different sources behave?

You can see the difference in two ways:
1. By using Conversion goals, you can compare how many people convert from paid adwords traffic versus how many people convert from organic traffic.
2. Individual behavior: because Opentracker is a clickstream analysis software, you can look at the clickstreams of all individual visitors and see how traffic from different sources behaves.

Unbiased 3rd-party verification

What is also good about using Opentracker is that we are an unbiased 3rd-party. That offers two advantages over the adwords and google analytics solutions:
1. You can see traffic sources and roi from all your traffic sources – not just Google
2. As a 3rd-party who does not sell traffic, Opentracker is neutral, i.e, no desired outcome

Traffic sources automatically filtered

The conversion sources are automated – a lot of paid and organic traffic is already filtered in Opentracker automatically. We call referrers “sources” because they are traffic sources.
if you would like ROI information per goal, you can add this by creating goals.

Click here to read about seeing All Your Online Advertising in One Report.

Here is an article explaining Conversion and ROI reporting.

Goal measurement – who came to what page

Goal measurement: we can generate a report for any page on your website that shows you all traffic sources per page and the total return-on-investment for any traffic conversion goal.

How do we do this? We combine several pieces of information. If you use Destination URLs we get the cost per click from the URL. Define the pages you want to be conversion goals in the report. Place a line of code in each goal page and Opentracker will do the rest.

How do visitors from different sources behave?

You can see the difference in two ways:
1. By using Conversion goals, you can compare how many people convert from paid adwords traffic versus how many people convert from organic traffic.
2. Individual behavior: because Opentracker is a clickstream analysis software, you can look at the clickstreams of all individual visitors and see how traffic from different sources behaves.

Unbiased 3rd-party verification

What is also good about using Opentracker is that we are an unbiased 3rd-party. That offers two advantages over the adwords and google analytics solutions:
1. You can see traffic sources and roi from all your traffic sources – not just Google
2. As a 3rd-party who does not sell traffic, Opentracker is neutral, i.e, no desired outcome

Identify and track your visitors

Web Metrics 101

Web Metrics 101

Start your free, no-risk, 4 week trial!

Executive Summary and Article Navigation

In this article you will find discussion and definitions of:

You will also find information about:

If you are looking for information about how to improve your site with stats, please see our article Making Stats Work For You.

What are they, and why measure them?

There are various terms used to describe the science of recording and interpreting website statistics. Web metrics, web analytics, web stats and site stats are examples. ‘E-metrics’ refers to analysis of electronic businesses.

Web Metrics

The ‘metrics’ of web metrics refers to measurement, the science of measuring websites. Specifically, measuring website events, and extracting trends. For Opentracker, those events are human clicks.

Web Analytics

Web Analytics is the act of distinguishing categories within recorded stats, and analyzing for patterns. The process of analytics means, literally, taking apart the whole of something in order to study its component parts.

Website Statistics

Statistics are a scientific application. The goal is to form actions, for example website content management, based on the data which are recorded.

Apply statistics in order to reduce guesswork. Simple questions can be answered, for example, something very basic; are there more or less people coming to your site this week than last week? Is your site doing better or worse this week?

What should your stats tell you? They will inform you about numerous aspects of your traffic; the number of (returning) visitors, and how visitors surf through your pages. This information tells you about the content of your site and how visitors use it. Your traffic statistics are an indicator of website performance. Thus applied, stats can be effectively used to make updates.

Comparing different types of measurement is very useful

When comparing different types of measurement, the classic scenario of “the difference between apples and oranges” often arises. In the same way, different website statistics programmes have unique ways of measuring important variables such as pageviews, unique visitors, and visits.

Therefore it is not always easy to compare the results generated by two statistics programmes to track one site. The process itself can be very useful, in terms of thinking through the differences in results and determining what is actually being measured. We encourage the use of numerous programmes, for example, combining a tracking service with log analysis.

If the method of measurement stays the same through time, then the results will be perfect for purposes of comparison. Therefore, choosing the method of measurement is important. Scientifically speaking, changing the method of measurement during an experiment invalidates the process.

If you compare results from two types of measurement you will find differences in numbers. For example, measuring pageviews vs. unique visitors, or the whole site vs. specific pages. If you compare the same statistics over time, you are not changing the method of measurement. This is the most accurate way of recording statistics. This will allow you to find patterns and definitive answers, for instance if traffic is growing or diminishing. Is your “Generate new leads” campaign working, are visitors returning over time? Do your efforts to bring targeted traffic through a PPC campaign lead to conversions? Do returning visitors generate more revenue than the first-time visitors?

Statistics and determining what to measure

In any statistical endeavour, the first step is to define what is being measured. In website cookie tracking, the common denominator is human events, clicks on a website, which are defined as pageviews.

Specifically, the statistics discussed here are a translation from raw data, clicks, and server-browser dialogues, into a user interface from which patterns can be discerned. The goal of web metrics is to extract patterns which tell you what is happening. The next step is to create actions, i.e. what to do about your traffic patterns.

Web metrics and analytics is an exciting field at this moment, because there are not many patterns being sought. An example might be comparing ‘bounce rate for first time visitors’ with ‘bounce rate for returning visitors’, which has not become a standard of analysis (aggregate bounce rate stats tell you how far into your site visitors are clicking).

How to use realtime stats: ongoing fine-tuning

For a practical guide, please see our article Making Stats Work For You.

Note: nothing can be measured with 100% accuracy. The skill lies in trying to keep measurements useful, despite the inability to reach 100% accuracy. An acceptable margin of inaccuracy within the scientific discipline of statistics is 5%. That does not make the world an uncertain place – it means that you have to be specific in knowing what is important. For example, trends, are trends rising or declining over time?

The process of determining what to measure involves the creation of definitions. There are always elements being under- or over-measured. That is why the system requires constant calibration, in terms of what people really want to know, which in turn determines what should be measured. An example would be the question “what constitutes a search engine?” Should the Yellow Pages and White Pages be included? There are new search engines & portals appearing every day. What criteria should be used to classify search engines? Our list of officially recognised search engine list, located on our forum, requires constant calibration.

Marketing strategy: it is important to focus on the the most important variables for you, and locate an application that provides these measurements in a clear format. For example, measuring the performance of specific keywords that you purchase for your Pay-Per-Click campaigns (PPCs).

Statistical needs vary depending on site size. Therefore it is up to statistics programmes to present the statistics in a way that is useful for webmasters of different sized sites.

Large sites, for example, are more interested in trends. Larger sites generate higher volumes of data, in which clickstreams may not be very interesting, unless usability is being improved. As there are too many clickstreams (e.g. sites which receive several thousand visitors a day), for large sites, often aggregates are more helpful, while smaller sites are interested in discreet data.

Trends are aggregate statistics. For example, a site’s bounce rate is an aggregate statistic. Bounce rate are stats designed for the purpose of identifying patterns which are hidden within the stats.

Discreet stats such as clickstreams, will tell you what individual people are doing on your site. Discreet stats are not aggregates, as you are actually seeing what the data is “built” of.

This type of information (clickstream analysis) is very useful for development purposes and understanding user reaction (aka usability). If you are designing a new site, knowing how first-time visitors navigate will help to determine how successful the site is, and what changes need to be made.

Opentracker and statistical accuracy

Opentracker is a best-of-breed solution. We offer a high degree of statistical accuracy because we use cookies to measure unique visitors. Human events, in the form of page views, are used to generate the statistics we present. One click is equal to one page view, a one-to-one correlation.

We do not sample or extrapolate: We count unique visitors

Website traffic data presented in Google Analytics is sampled.

To illustrate some of the difficulties associated with counting and measuring, consider a statistic that tells you how many people voted in an election. Counting votes is a difficult process and re-counts are often undertaken and it’s not unusual to reach different totals every time.

When polls are released, the number presented is an extrapolation, based on a percentage of people contacted by phone, or asked at the door for whom they voted.

Opentracker presents trends derived from actual clicks. This is how we narrow the margin of error. We use optimization techniques based on cookies and visitors to improve accuracy.

When the trends presented are derived from actual clicks, the margin of error is narrowed. Traffic measurement techniques based on cookies improve accuracy.

Our point is that data, (i.e. statistics) are numbers created by people. Therefore it is important to understand how these numbers are defined and generated.

The data collected with cookies gives insight into site visitors over time, the traffic is deduced from unique visitors and there is minimal ‘double-counting’ of visitors. We often asked why Opentracker’s traffic numbers are often lower than those recorded by log files and this is why.

We believe Opentracker to be at least 30% (and probably much higher) more accurate than standard web tracking and statistics solutions currently available.

Related pages

Building Online Community
Social Media Advertising

Identify and track your visitors

Big Data Orientation

Big Data Orientation

Ownership of data
Some helpful terminology:
3V’s: volume, velocity and variety.
4S’s: source, size, speed, and structure.
Why are we talking about this & what questions should I be asking?
Where is all the data coming from?
Conclusion: and the winner is…

Ownership of data

This is especially an issue for larger, more traditional companies, for whom data is something that is always kept on-site. Using IaaS (Infrastructure-as-a-service) means letting somebody else host, store, and manage your data, so that they can generate reports. It is possible to resolve this in different ways, although it’s probably more cost-efficient to let an IaaS manage the server park. Data ownership can be stipulated in a contract, so that while Company A owns their data, Company B hosts and manages the data. Access to the source code of the software used to manage the data can also be an issue. This can be resolved by placing the code in escrow or using open source software.

Some helpful terminology:

3V’s: volume, velocity and variety.

3V’s: volume (lots of gigabytes or terabytes), velocity (coming in quickly, faster than normal methonds can handle, coming in too quickly to process), and variety (unstructured – multiple sources too many for pre-defined tables). For example, in terms of variety, Opentracker collects urls, countries, ip info, user-tagging, conversion, tech specs, OS, and custom events, meaning any piece of information that can be defined and sent). This information comes from a variety of sources: location databases, carriers, browsers, device specs, cookies, javascript, social networks, networking sites, user profiles, company databases, etc.

4S’s: source, size, speed, and structure.

IaaS: Infrastructure-as-a-service. An Infrastructure-as-a-Service provider typically delivers a combination of hosting, hardware, provisioning and basic services needed to operate a cloud. To quote wiki (again): In the most basic cloud-service model, providers of IaaS offer computers – physical or (more often) virtual machines – and other resources… Cloud providers typically bill IaaS services on a utility computing basis: cost reflects the amount of resources allocated and consumed.

 

Why are we talking about this &
what questions should I be asking?

The point of all this data is to enable decision-making. That means that an answer is needed in time to make a decision, not two weeks later. What’s interesting is that there is a trend from looking over our shoulders towards looking ahead >> we used to study data from the past >> real-time processing >> predictive analysis.
So it’s a bit like a camera which starts by panning backwards towards where we came from, then swings around to show the runner, and finally swings to show the path disappearing towards the horizon.

From a corporate perspective, you may hear questions like these:

Am I going to set up data-processing infrastructure?
Do I want to own the data?
Do I want to put data in the cloud? (hint: Answer: if it absolutely needs to stay PRIVATE don’t put it in the cloud)
How do I get the data into my own company’s infrastructure?

There are a few household names that have become experts at handling large flows of data; Twitter, Google, Facebook, Amazon. They’ve solved the big data problem for their own organizations. The next step is how to make those solutions available to all the other enterprises that exist today.

 

Where is all the data coming from?

It used to be coming (only) from people. Now its coming from both people and machines (sensors). For example the temperature of a room, which if processed, comes from a sensor in a room, but there is no person sending a signal.

At the moment, data is being generated by people, through clicks, swipes, and taps. Increasingly in the future, servers and sensors (non-people) will be generating the data. Smartphones, android, iphone, self-service ticketing machines, card transactions etc, almost everything we do and touch is generating data.

This data is increasingly unstructured, but still needs to be processed. The fact that it is unstructured is what led Charles Fan to name it crap (create, replicate, append, process) – when referring to the mind-boggling amount of data which is created / stored / generated and often left for roadkill. Why left for roadkill? Because the amount of know-how and resources needed to derive meaningful conclusions from all the data collected is prohibitive.

This leads to the great challenge we are facing: not just to collect and store all the data coming in, but to organize it. Nobody cares about deleting it or updating it (hence the crap description), and so whoever designs a new data center for crap will be the winner.

 

Conclusion: and the winner is…

So the goal is to give enterprises access to data storage & management. Enterprises, which tend to be larger and more traditional organizations, require flexibility in terms of infrastructure location, they may not want their data in the cloud.
From the point of view of companies who provide Big Data solutions, the winner will be the one who structures their service in a way that is accessible for more traditional organizations.

From the point of view of business and retail, the winners, simply put, will be the businesses and companies who learn how to apply Big Data (read: innovative data analysis, or connecting the dots) analysis techniques to the data they possess about their clients, or market conditions.

Identify and track your visitors

Online Privacy Issues

Start your free, no-risk, 4 week trial!

Online Privacy Issues

We receive many questions asking us about what tracking services can and can’t do, questions about ‘online profiling’, ‘digital blueprints’ and leaving a ‘data trail’. We are also posting numerous articles on the site explaining what tracking services can do.

Summarised overview of online privacy issues

In this article you will find definitions of:

  • Anonymity
  • Merging clickstream data & personal information
  • Personal contact information
  • Personally identifiable information
  • ‘Computer information’
  • Internet protocol (ip) addresses

In this article you will find discussion of:

  • Why did we write this article on online privacy issues
  • Collecting clickstream data
  • What are we  with this data
  • Capturing email addresses
  • Tracking of individuals
  • The trade-off in privacy

Online Privacy issues

We receive many questions asking us about what tracking services can and can’t do, questions about ‘online profiling’, ‘digital blueprints’ and leaving a ‘data trail’. We are also posting numerous articles on the site explaining what tracking services are doing. In this article, we explain what tracking services, and Opentracker in particular, cannot do.

Online Privacy issue is an important topic on the internet. Much of the discussion is characterised by hype, and preys on fear. This is apparent from looking at the wide range of ‘spyware protection’ products available on the internet, and the language used to promote these products. Without knowing the realities of how their surfing patterns are tracked, and what do they do with that information. It is a concerning topic for many internet users.

The essential point on repeat throughout this article is that by far the vast majority of information collected is in no way connected to personal contact information.

The primary reason for this is that email addresses are not transmitted by surfing.

What is the purpose of this article?

We have written this article, in the hopes of providing information to increase public awareness of what they are doing with your tracking information.
The specific issues we address are anonymityemail addresses, and personal contact information.

Privacy is a topic of great concern on the internet. This is especially the case as many privacy and surfing issues are non-regulated.

At the moment technology is changing very quickly, so that it is difficult for rules and procedures to be established and enforced, as change is the only constant. Perhaps the greatest cause for concern is the unknown. Surfers do not know when and if they are being tracked, who collects that information, how it is done, and for what purposes.

We hope that by explaining what tracking services in general, and Opentracker in particular, can and cannot do, that we can help to dispel some myths. We feel that fear, while a good way to sell protection products, is not a rational basis for developing privacy guidelines or stimulating discussion. Technically speaking, the ‘anonymous surfing’ that many protection products guarantee is already the status quo.

Of course there are many legitimate security concerns, particularly in terms of viruses, but in terms of privacy the dangers are often over-hyped. The primary concerns, as we see them, are information security, in terms of safe data transferal, back-up, and storage of data, and the encryption and safety of information such as credit card info, passwords, etc.

The main information that tracking services collect: clickstream data

In terms of individual information relating to surfing habits and patterns: clickstreams, or click-paths, comprise the essential data that we collect.

The clickstreams that we record on behalf of our clients are not attached to physical or electronic contact information of the people who are visiting the websites. In other words, there is no information that connects people to the statistics we are recording. We do not collect email addresses of surfers. This means that there remains an essential element of anonymity.

The possible exception to this is the IP (internet protocol) address. IP addresses, however, are owned by companies and the ISPs who provide them to their customers. This means that in the great majority of cases this information cannot be used to locate a specific user, unless the ISP itself, or company, make that information public.

Opentracker couples the visitor’s profile with the clickstream. Each profile contains technical stats of visitors, also known as ‘computer information’. Computer information is different from ‘individual profiling’ and ‘online contact information’. Computer information tells us the technical specifications of a user’s browser: their screen resolution, operating system, router, ISP, etc. This information is not linked to personal contact information.

On our site, we provide a link to ARIN a public IP lookup database and the contact information provided by ARIN can put you in touch with the owner of the IP address of your visitor. Most often, this is the ISP corporation that owns the IP number. The exception is larger companies that do not outsource their internet infrastructure.

Full screen Facebook user-data in Opentracker clickstream

We are providing an example of a clickstream and personal profile to the right, which you can enlarge by clicking. If you would like to interact with a clickstream, please login to our demo and take a look. For starters, you will able to see your own clickstream across our site.

Capturing email addresses

The question we receive most is about the possibility of capturing the email addresses of people who surf on a website. As far as we know: it is not possible to automatically collect the email address of a person who surfs to a website. That does not mean that this technology does not exist, or that somebody is not developing it, but that we have not heard about it.

The technical reason that we are not able to capture a visitor’s email address is that this piece of information is not listed in a user’s browser. The information that tracking services record comes from the user’s browser.

What can and does happen is that a person voluntarily enters their email address for one reason or another. The obvious examples are logging in, entering contact info for an online purchase, signing up for newsletters, and “unsubscribing” to spam. Again, to our knowledge, this is the only way that email addresses are captured.

It is possible to purchase email address lists that have been compiled by companies who sell this information.

As a precaution, if you are concerned with your privacy, setup an email account that you always use to fill in a required email field, if you are not sure where the information is going. Do not connect your physical contact information to this email address.

It is important to keep in mind the possibility that once a person has entered their email at any point into a site, their email address can be stored with their clickstream in a process called tagging. This means that a connection can be made between, for example, login info, and clickstreams. This possibility will lead to a direct connection between surfing habits and personal contact information. That means that Amazon.com, for example, have the potential to keep a record of every page a visitor has looking into their site, and combine this information with purchase history, and billing details.

An important aspect of this potentiality to remember is that each site can only see what visitors are doing on their site, not across the entire internet. That means that the internet is still highly compartmentalised, in terms of tracking surfers.

What happens in the scenarios presented by privacy advocates is that ‘personally identifiable information’ is collected so that ‘online contact information’ (email address) may or may not be merged with ‘physical contact information’ (billing address). This is called ‘merging clickstream data with personally identifiable information’. This is an understandably worrying scenario presented by privacy advocates, in which a person might receive a catalogue in the mail advertising similar products to those viewed online. In this sense, it seems to be sexual products and information related to adult-content websites that calls for safeguards to individual privacy. 

So what is the information that we collect designed to do?

The scenario that we are presenting above is a worst-case scenario. In the case of Opentracker, there is no personal contact information linking a particular person or email address to a clickstream. We do not collect email addresses. The only personal piece of information captured is the IP number. IP addresses are owned by the companies (i.e. aol, sprint, earthlink) that provide them to their customers. Additionally, some companies and corporations are introducing round-robin IP numbers, whereby IP addresses are re-assigning on a regular basis.

This means that in the case of tracking services similar to Opentracker, the user’s anonymity is preserved. Anonymity is defined as a condition in which ‘your true identity is not known’.

The information that we are collecting on behalf of our clients is designed to be aggregated and used to identify traffic patterns. This activity is referred to by one privacy group as ‘affirmative customisation’. We do not engage in ‘individual profiling’, nor do we provide ‘online contact information’.

The information that we collect and present is passively generated by users browsing through the site’s of our clients.

The information that we collect is designed for various purposes. Essentially, it tells webmasters what is happening on their sites. The information is designed for purposes of marketing, advertising, updates, ad campaigns; essentially content management. By studying clickstreams, webmasters learn which pages are important and which pages need help. They learn about their traffic, i.e. what countries it comes from. We aggregate the data to give a lot of averages: average number of pages viewed, time spent, etc.

Additionally, we do not sell, lease, trade, etc, the information that we collect to anybody. It ‘belongs’ to the webmasters of the sites that we measure.

 

Tracking of individuals

Specific to individuals, we track visitors over the long term. That means that for each visitor to a site, we maintain a record of every click they have made on a website. We can only do this for the pages on which our code is installed. It is possible for webmasters to inspect these clickstreams, and see what an individual did over many months. The only ‘name’, or ‘tag’ that these visitors have is the time of the last click that they made.

Therefore, technically, visitors remain anonymous, as there is no contact information linking a person to their clickstream. Visitors remain statistics collected together into aggregated site stats. These site stats reveal, for example, that the average visitor comes to a site’s homepage 2 times a week, and stays there for X amount of time.

The trade-off in privacy

This is a quote from a privacy advocate group:

“However great the potential benefits of online tracking, they remain incomparable to the grave implications of Internet users’ loss of privacy.”

(http://www.cdt.org/privacy/guide/start/track.html)

While we acknowledge the potential for concern. We feel that by using the aggregated statistics that we provide, our clients can make their websites responsive to the surfing and clicks made by their visitors. The point here is that the internet can become increasingly interactive, when traffic statistics and analysis are applied. Also, if webmasters do not know what is happening on their sites, there is simply too much guesswork involved.

Obviously there is a very real concern for a lot of people that their privacy is somehow being abused. We would like to respond to these concerns, primarily through education, but also by opening up a dialogue on any related questions or ideas. Please feel free to write to us, or post any feedback on our forum.

The Surprising Way Data Science Helps in Cancer Research

The Surprising Way Data Science Helps in Cancer Research

Summary

In this article, you will learn about data science and its effects on cancer research. In particular, you will learn about:

  1. How data science helps in early cancer diagnosis
  2.  How data science can help find the cure for cancer

Introduction

data science cancer research

In light of the World Cancer Day which took place on February 4, we thought that it might be a nice way to show you how data science is actually helping in the research for the cure and treatment of cancer. Millions of people lose their lives or loved ones to cancer each year, and while scientists are doing their best to cure it, there has been only limited progress in finding the cure. A lot of reasons why cancer is not detected in its early stages- where most types of cancers are curable- is because of the lack of technology and information available (until now) to help doctors diagnose them at the right time.

The problem with cancer treatment is that while a lot of valuable data is available to doctors everywhere around the world (the internet has a lot to do with this), only a few have experience with clinical trials. This is because of the general attitudes of patients once they hear that they have cancer. It’s almost as though just hearing that one has cancer is enough to make a lot of people lose their will to fight the disease at all. Another problem with cancer is that not all symptoms are the same for everyone. Same goes with the medicines. A medicine that works for one patient might not work for another. There are thousands of pharmaceutical companies that are pushing new medicines out every day and, in spite of the massive data available, every doctor may not have heard about every new release.

This is where Big Data comes in. The American Society of Clinical Oncology (ASCO) has started an initiative called CancerLinQ which collects data about these new medicines, their usage and results in real time which doctors can use to look at the various symptoms and the required medicine to increase the chances of treatment.

Data Science in Early Cancer Treatment

Data Science can be integral to the early diagnosis and subsequent treatment of cancer in many patients. This is because knowing a patient’s symptoms and prognosis, and then comparing it to a database of people with the same symptoms can help medical teams decide how they want to treat cancer and begin the treatment process. Big Data helps analyse this massive amount of information available and also helps to categorize data according to age, race and gender so that more detailed information is available, forming patterns which doctors can use to treat cancer. Big Data is also able to predict long term solutions based on the availability of previous cases, both successful or not, and thus help doctors determine the best course for actions.

data science cancer research

Researchers also use Big Data to help understand the genetic changes that lead to the formation of these cancers, studying from a large pool of case files of cancer patients from diverse backgrounds. This, combined with the ability to sequence the DNA of different kinds of tumours, combined provides a very strong framework for researchers to build their research on. This helps to develop medication which can be used to target the various different kinds of tumours and also understand how to fix the genetic change which leads to them in the first place. By doing so, not only can cancer be treated from the very beginning, but the continuous addition in the database will help future scientists with their research as well.

 

Curing Cancer

Big Data cannot cure cancer on its own. However, scientists can make use of it, along with other intelligent machines, to study the complex ways in which cancer cells multiply and form tumours. Before Big Data, it took scientists and researchers decades to realize the link between lung cancer and cigarettes. Now, with the help of modern technology, research facilities only need a hospital’s approval to check its records with histories of cancer cases. Instead of putting a lot of manpower and work into looking for patterns, these scientists can easily rely on Big Data and AI to help analyse and understand patterns.

Many government organizations dedicated to finding the cure for cancer have realized that it is going to be near possible to find a cure for cancer without using AI and Big Data. Because of this, there is a lot of development happening and organizations such as Million Veteran Program in the United States and the Cancer Genome Atlas in the UK are working towards using Big Data to help create human genomes, open to researchers for analysis via the cloud. The point is to study as many cases as possible in order to get newer insights as quickly as possible. The sooner we understand how all types of cancers are formed, the sooner we will get to actually curing them all, once and for all.

Conclusion

data science cancer research

As you can see, data science is actually helping scientists all around the world look for the cure and treatment of cancer worldwide, as we speak. Future projections show that the investment in Big Data and related technologies by companies and governments all around the world will help further develop this technology. A couple of decades ago, sequencing human genomes would cost around $10 million but now it can be down for less $1000. Research and development in this field are being accelerated as pressure is rising on pharmaceutical companies and researchers to find the cure. Whether this motivation to help save the lives of millions of people is out of good intentions or the greed to control a future pharmaceutical monopoly is unknown, but a growing number of clinics and companies have started to use Big Data to analyse and then determine the symptoms, causes and medication required to treat cancer.