But as always, you should be cautious with data from Google Analytics. It’s crucial that you understand how the data is captured and processed before you can judge how accurate it is.
Before you can activate demographics or interests reports in your own Google Analytics profile you must switch to Google’s Doubleclick cookie. This cookie is used whitin Google’s Display Network to store information on your browsing behaviour.
Based on your websurfing, Google tries to predict your gender and age. Basically, this is the same as what traditional media has been doing for years: you like to watch The Simpsons, so we assume that you’re a 15-35 year old male. Therefore we will show you advertisements that target males in this age category. The only difference is that Google manages to collect much more data than any other publisher before.
They combine the estimations based on browsing habits with information that you provide yourself. So be aware that Google may use the information that you share on other websites (such as social networking sites) or in your Google profile. Take a look in the AdWords helpcenter for more information on Google’s targetting methods.
It’s normal to be sceptical about the demographic data that Google provides. Afterall, it’s an estimate that is based on a sample of your visitors. There are three reasons why you should be carefull when interpreting the demographics and interests reports.
1) A cookie is just a cookie
Add blockers tend to prevent the DoubleClick cookie from firing. This means that Google will not capture any data from visitors with ad blockers installed. Besides, when you clear your cookies all data is lost and Google will have to restart assembling your profile.
2) Subject to thresholds
Google applies a threshold to protect the privacy of every individual. They say: tresholds are applied when data might allow the recipient of the report to infer the characteristics of an individual visitor. When this occurs you will be warned by a yellow notice below the report title. This reinforces data sampling even more.
3) Data Sampling
As you probably already know, Google often uses only a subset of the data to compile reports. Data sampling happens automatically when your report includes more than 500.000 vistis. Medium to big websites have to deal with this very often. We noticed that demographics reports in particullar are exposed to heavy sampling. Often, the reports you see are based on less than 10% of your total visits.
Based on these three obstacles it’s easy to conclude that the demographics and interests reports are worthless. However, our web analysts don’t draw any conclusions until they got the chance to test it themselves. We noticed that other analysts had doubts about the accuracy too. Neil Moree from Provenance for instance put up his own test. But he had only a small sample size. We decided that we had to find a bigger data sample, before we could jump to conclusions.
One of our clients participates in the CIM Internet study. The CIM is a Belgian organization that provides objective and independent figures on the number of “visitors” and the number of “visits” and “page requests” of the participating sites. This data is public and is mainly used for selling advertisement space.
During the CIM Internet study visitors are directly asked for their age, sex, residence etc. It seems logic that this approach would deliver more accurate data than the estimates of Google. But as with every survey they only reach a part of the entire visitor population. So we also have to take sampling into account here.
We decided to use the data from the CIM study for our test. As this is the closest we’re able to come to trustworthy numbers concerning the demographics of large visitors samples. The client we’re talking about has an audience of more than 1,5 million visits/month.
The results were stunning! When we compared both data sources, Google Analytics reported the same tendencies as the CIM Internet study. The ratios for the variables sex (47% woman en 53% men) and language (64% Dutch en 36% French) were identical in both. For the variable age we noticed some small differences. This was mainly caused by the different division into age groups: CIM includes minors (-18), while GA doesn’t. If we exclude the minors from the CIM data, we notice that the ratios of the age groups are very similair. Their was a maximum deviation from only 2%.
The only variable where we saw clear anomalies was location. But this discrepancy has a logical explanation: the CIM study asked people for their recidence, while GA reports the location from where your visitor is surfing. The difference between the two only shows that our visitors are often visiting our clients’ website while their not at home.
We may conclude that Google is surprisingly efficient in making accurate estimations of your visitors’ profile. The only requirement is that you have access to a large sample of data. Be aware of this when you make use of demographic date in Google Analaytics. Do not draw conclusions based on short time ranges or low-traffic sites.
Jente De Ridder | 17 March 2014