This article is an extension on my previous articles about referrer spam in Google Analytics:
My third installment is another step in the fight against referrer spam, but targeting “Ghost Referrers”.
What Is A Ghost Referrer?
A “Ghost Referrer” is a method spammers use to send fake HTTP requests using the Google Analytics Measurement Protocol. These fake HTTP requests are sent directly to the Google Analytics servers and never actually hit your website. These ghost referrers can show up as referral traffic, direct traffic, and even organic traffic.
Credit: Carlos Escalera, Ohow.co created this simple flow chart on how ghost spam works.
So how do you actually spot ghost referrer spam?
Head over to your Google Analytics account and navigate to the “Acquisition” section. Under the “All Traffic” section, select “Source/Medium”. Set the secondary dimension to “Hostname”. You should see this view.
According to Suso, “a hostname is a specific name pointing to a specific host”. In the screenshot above, skystats.com is the hostname for our website. Note that there could be other valid hostnames like help.skystats.com or even www.youtube.com (if you added your Google Analytics code to your YouTube account).
So we can see some typical referrer spam highlighted in red. Take a look at our secondary dimension, Hostname. I have highlighted the hostnames for each of these referrer spam sites. The spammer was not setting the hostname for the #3, #5, #6, and #7 sites. But look at #9, the spammer is using a legitimate hostname, theguardian.com. Notice that the source is actually theguardlan.com with a “l” instead of an “i”. This spammer is trying to be sneaky, but it still does not match our hostname for this website.
So the real weakness here is that the spammer could ratchet it up another level and actually spoof the hostname for the website. This is why creating a Google Analytics filter on hostname is not 100% effective, but will help remove a majority of these ghost referrers that we are seeing today.
Using Hostname As Filter
The logic for setting up a hostname filter in Google Analytics is to record all the true traffic and ignore the rest. Be warned, if you do not identify all of your valid hostnames you could be excluding true traffic. So for example, we could have ppc.skystats.com or a third-party shopping cart (i.e. ecommcheckout.com) that should be included in the hostname filter.
Some other common hostnames that should be added include translate.googleusercontent.com (Google Translate), www.youtube.com (if you configured it to use your Google Analytics tracking code), and webcache.googleusercontent.com (Google cache version of your site).
You should have set up your Google Analytics account with three views, an unfiltered view, a test view, and a master view. Your unfiltered view will be a view of your data that does not have any filters placed against it. It represents your raw data. The test view is where you will test new filters prior to creating them in your master view.
It is better to make a mistake in the test view, since there is no way to retroactively retrieve any filtered data.
Setting Up The Hostname Filter In Google Analytics
- So let’s head over to the Admin section and select your test view.
- Navigate down to the Filters section and hit the “+New Filter” button.
- Select “Create new Filter”
- Enter “Hostname” as the “Filter Name”
- Select “Custom” under the “Filter Type”
- Select “Include”
- Choose “Hostname” in the “Filter Field”
- In the “Filter Pattern” add your regular expression
- See The Filter Expression [Really Simplified] section on this page for help
- Hit “Save”
Let the filter run for a day or two and compare the data with your master view. If it is working then recreate the hostname filter in your master view.
This should provide you a better representation of your actual site traffic. Remember, “You treasure what you measure”!