Clean Data, Clear Analytics

Jonny Darling Insight 1 Comment

Spammers, a bit like mosquitoes, are a plague. Email is probably where we first encountered spam. Remember those old hotmail & yahoo! mail accounts? More bulk mail than actual inbox. Then we were inundated by spammy websites with endless pop-ups and flashing banners. Just exiting the page proves a puzzle! What’s even more perplexing is the latest wave of unwelcome guests to our own website, and more specifically, those invading our analytics!

* Updated 04/09/2016

Google Analytics Referrer Spam

2015 is the battle of the bots. Webmasters have been fighting a losing battle to keep their Google analytics reports free of spam referrals. Any SEO’er worth their salt will undoubtedly have uncovered swaths of unscrupulous data in their traffic acquisition reports. Not quite household names, but to industry insiders the referrer spam names darodar, 100dollars-seo, free-social-buttons, 4webmasters, success-seo and semalt send shivers down our spine! War, what is it good for, absolutely nothing!

Referral traffic from search engines and external websites is a key benchmark for measuring SEO performance. We stand over Google analytics reports as justification for what we do. They tell it like it is, up or down, good or bad. Lately however, this weapon, our first line of defence, has been blunted. Analytics reports are only as good as the data captured. When the data becomes corrupted, the reports tarnish. Is this the aim of these ghost referrers and crawler bots; to muddy the data stream?

Spam, the reason why?

It’s a little more malicious than prankster. Piwik, an open-source competitor to Google analytics has not been immune either. They recently explained why analytics spammers might be inclined to spam.

Spam has always been a numbers game and the analytics software vendors would have us believe that they are simply the involuntary carrier of the spam virus. Benign as a mosquito, analytics tracking by its very nature penetrates their host, the website in which they are embedded. It’s the host website, not the analytics software, that suffers as spam bots breed and multiply across web domains, feeding off healthy websites’ domain rank.

Granted, spam bots benefit from their infection of analytics log files as the parasite thrives in the website host environment, but the carrier too grows fat on the life-blood of the site. Analytics reports become bloated with bad blood.

spam referrals google analytics reports

Bad Blood

At first glance, you’d suggest that rogue data would dilute the value proposition of Google analytics. What good are reports if they don’t tell the full picture or show an inflated vision of reality? Sure, from a purist perspective we want the truth, the whole truth and nothing but the truth, but do we really, or more to the point, does Google? What harm if Google analytics reports massage the ego of website owners about their traffic stats? In fact, isn’t it in Google’s interest to convince website owners of their own self-worth and to encourage them to invest more in their website traffic? Google certainly hasn’t be quick to stamp out bugs in the system as witness in their own analytics product forum! Is it too far fetched to suggest that with the roll-out of Google Analytics Premium, Google is building in a certain redundancy on the freemium product?

Dirty Work

The other side of the coin, it’s not a stretch to see how dishing the dirt on Google washes clean their competitors. If Google Analytics becomes obsolete, or at least devalued, competitor analytics platforms value rise. Piwik are quick to point the finger at the profiteering spammers but there’s profit to be made in analytics data visualisation and it wouldn’t be beyond rival commercial analytics companies to hang Google’s dirty laundry out in public. Go a step further, could there be some underhand tactics at play whereby spammers are being actively recruited by these rival firms to undermine Google analytics? A stretch too far?

Clean Up Operation

Regardless of the reason why website analytics data is fudged, the more important question is how we fix analytics referrer spam. It’s a two-step process, the first part on the website application server side, the other settings within Google analytics admin.

1. Server side

In the case of “crawler referrer spam”, bots visit the website in order to create sessions that appear in google analytics reports. Not only is that a bad thing for your analytics reporting data, it is unwanted traffic to the website server that can damage overall CPU performance or page load balance. Bottom line, we want to stop bad bots from ever crawling the site in the first place. There’s a myriad of ways within .htaccess rules and config files to redirect attacks away from the site but given 1 in 4 websites in powered by wordpress let’s do this the easy way! I’ve experimented with a number of plugin options and with trial and error favoured a plugin called Bot Blocker.

Bot Block WordPress Plugin

WordPress plugin developer Ricky Dawn explains it in plain English

In follow on, Ricky highlights the necessity to refine the settings in Google Analytics.

2. Google Analytics Settings

The most definitive guide to Google Analytics filters for referrer spam comes from Carlos Escalera of oHow. He visually introduces Ghost spam and crawler referrer spam, provides a step-by-step tutorial on how to find spam referral traffic in Google analytics reports and crucially, how to stop analytics spam using two filters in the analytics admin settings. His detailed yet concise blog post, How to stop spam referral traffic in Google Analytics demonstrates how to create a valid hostname filter to remove ghost referrer spam and how to set up a campaign source filter for crawler spam. Though exhaustive and a little exhausting in the first instance, these configurations are actually quite simple to implement and once familiar with the set-up, very quick to replicate across your analytics properties.

Google Analytics filter settings spam referrer


Clean data, Now & Then

By this point, we have clean data flowing in to our analytics reporting free of website referral traffic spam. This is reassuring for the future, but how about past performance. We need to separate the wheat from the chaff. If Google analytics has been running on the website up to this time without spam blockers and filters, the only way to get a true insight on our traffic data to date is to use Google analytics filters. Thankfully, our friend over at oHow has put together yet another illustrated guide to removing referrer spam from historical analytics data using segments. This is so important and no better illustrated by the spam detection tool in-built within the segment set-up of Google Analytics.

segment google analytics spam tool

It’s clearly visible when we segment out our reports for a past time period to include only genuine web traffic and exclude spam referrals that anything from 5 to 15% of the websites overall traffic can be diseased. Thankfully it’s reasonably straight forward to follow Carlos way to declutter the data.

Clean data, Clean Conscience

This post exposes the skew of false data in analytics and provides the remedy for rogue variables. Analytics as an afterthought is a fast-track to the afterlife. Analytics is the life-blood for website SEO as content is food to nourish a website. Ignorance is bliss and it’s all too easy to be aware of a problem yet not act, particularly when the issue only inflates reporting numbers. Do yourself a favour. Clean up your act, clean up your data and deliver analytics intelligence with a clean conscience.

* Analytics Edge provides a thoroughly comprehensive and regularly updated guide to removing all Google Analytics Spam