Guide to Backlink Analysis: Identifying Problematic Backlinks

Brandon Millard
Reading time: 13 minutes
20th June 2013

Let’s say you’ve just received a new hot client lead. As part of your initial research, you should be interested in future proofing the site’s backlink profile against unnatural link warnings from Google. You need to quickly figure out if there are links pointing to this domain that could create future havoc in terms of Google’s Penguin algorithm and massive decreases in organic traffic. It’s time to investigate the site’s backlinks, also known as backlink analysis.

You may now feel a little overwhelmed about how exactly to go about this. After all, you are a busy person and the site may have tens of thousands to millions of links pointing to it. In this guide, I will walk you through analyzing a site’s backlink profile to look for problems. Working for Ayima, I would normally use our internal backlink information to read a site’s backlink tea leaves:

ahref-1

Since this data isn’t publicly available, I am going to use Ahrefs and Excel. Feel free to use another source of backlink data such as OSE to conduct your backlink analysis. Before getting started, I would recommend asking your site contact if they know about any previously purchased or unnaturally acquired links pointing at their site and if they can provide you a list of such links for investigation. Reviewing these links with a critical eye will greatly supplement your analysis below.

In this guide, my goals are to share tips around the following topics:

  • Building Excel skills for SEO
  • Using pivot tables to answer data questions
  • Identifying problematic/unnatural links
  • Gaining confidence around backlink analysis

For this guide, I needed an example site to analyze. I wanted a site that was growing quickly in a lucrative industry. I combed through the fastest growing businesses on INC’s 5000 list for companies that are likely dependent on organic traffic for their success. The site I choose is from the competitive online credit industry. For the purposes of this article, I have changed their name to brand.com. If you can figure out what site I am discussing below then please keep it to yourself:

“You have a grand gift for silence, Watson. It makes you quite invaluable as a companion.”

– Sherlock Holmes

Our site:
se-traffic
Nice organic growth! – source: semrush.com

Step 1: Collecting raw link data for backlink analysis

Visit https://ahrefs.com/, login, and type your domain in the homepage main search box. Use ‘yoursite.com’ rather than a specific URL such as http://www.brand.com/

backlinks-analysis-3

Turns out this site has some 110,000 backlink pointing to it. Of course, no one has the time to review a 100,000 links or even 1,000 links so how are you going to look through all of these links to spot troublesome backlinks that you can highlight with the potential new client? The short answer is you aren’t. You are going to rely on sampling, judgment, and organizing data.

Side note: You may be wondering why we are downloading data into Excel rather than looking at the top anchors tab in ahref.com. I have done this because drilling down to specific linking URLs for domains with big backlink profiles tends to load slowly in an ajax, web-based environment. Excel gives you added flexibility and speed to make the data work for you specific requirements.

ajax-4

We will be going to the ‘Raw export’ tab and downloading the maximum amount of backlinks to Excel. In this case, we are talking about 110,000 backlinks. With really large site, you may not want to download the maximum amount, as 1 million rows of data in Excel is a royal pain in the arse. Also, since Ahrefs orders these links by their value, 100,000 or so backlinks is plenty for this type of analysis.

download-5

Once you have this data downloaded and open in Excel, we are going to do a couple things. First we are going to trim our data down to one link per domain to help us maintain sanity. Then we will create a pivot table to look at data in different ways.

Step 2: Trimming data down to one link per domain

In your raw export spreadsheet, copy the column of URLs pointing at your site (i.e. UrlFrom – column C) into the first available blank column (i.e. column V). Change the heading of this new column to DomainFrom. Now highlight the entire column DomainFrom:

excel-6

Our goal now is to trim these URLs down to their root domain (e.g. from “https://www.example” to “example.com”). I do this with ‘Find and Replace’ and then use the tool ‘Text to Columns’.

First open Find and Replace (command+F on Mac):

find-replace-7

We are going to use Find and Replace 4 times to get rid of the first part of the URL (HTTP).

Leave ‘Replace with’ blank and click ‘Replace All’ after using each of the ‘Find what’ patterns below. Work your way through the 4 http patterns to effectively delete this part of the URL. Make sure to do this in the same order as shown below to make sure this works correctly.

In the ‘Find what’ field:

  • http://www.
  • https://www.
  • http://
  • https://

Here is what you will be left with:

domain-from-2

Now highlight this column again, go to your Excel ribbon > Data > Text to Columns. In the wizard, choose Delimited then hit Next. On step 2, choose the Delimiter: Other and type “/” in the Other box then hit Next and then hit Finish:

text-to-column-9

Excel has just taken your URLs and broken them out into different cells each time it encountered a forward slash. Our DomainFrom column should just have root domains now (ex. linkingsite.com). The other URL fragments are not important to us anymore so feel free to delete those columns (e.g. W – CC).

Last step is to remove duplicates from our DomainFrom column so that we are left with just one representative URL for each domain. Highlight just your columns of data (e.g. A – V) and go to Excel ribbon > Data > Remove Duplicates. Tip: You need to only select the columns you have data in rather then selecting the whole spreadsheet. In the popup window, select the DomainFrom column (i.e. Column V) and click Remove Duplicates. Tip: Uncheck the “Select All” option and then scroll down and check the column containing your DomainFrom URLs.

pivot-table-10

We have just trimmed our data down to 6,305 links, one from each domain.

Step 3: Using a pivot table to show the most common anchors pointing at a site

To create your first pivot table with this data, in the Excel ribbon go to Data then click on PivotTable.

pivot-table-3

This will open a new spreadsheet with a most unhelpful default pivot table. Using the PivotTable Builder, we can massage our data quickly into an actionable format.

  • Add the Field names: NoFollow and Site-wide to the Report Filter area
  • Add the Field names: Anchor and URLFrom to the Row Labels area
  • Add Field name: Index to the Values area

pivot-builder-12

You will want your Index values to be displayed as a Count rather than a Sum. If your Values area above says ‘Sum of Index’ then click on the ‘i’ info circle to open up a new window that will give you the option to summarize your data by Count instead of Sum.

pivot-count-13

Now you have a Pivot table created and are almost ready for action. To make this pivot table more navigable, I would recommend clicking in one of the cells containing an ‘Anchor’ (e.g. A6) then going to your Excel ribbon > Pivot Table and clicking the Collapse button so that you just see the various anchors pointing at your site and not all the specific URLs that use that anchor.

Now you will want to reorder your anchors by frequency. To order your anchors from most common anchor to least, you need to highlight all the cells under Total from the cell B6 to the very last anchor before your Grand Total.

pivot-table-4

Highlight from the first anchor total to the last:

pivot-table-5

Then click the A to Z sorting button twice to get your data properly in descending order.

pivot-table-6

Yay, finally we are getting somewhere. You now have a list of the most common anchors pointing at your site. Here is my resulting pivot table. I have censored out all brand-related anchors as we want to concentrate on ‘commercial’ anchors. Commercial anchors are the anchors that are most likely to create problems with Google. We will be learning more about commercial anchors below.

pivot-table-7

Step 4: Identify your most common commercial anchors

What’s a commercial anchor? A commercial anchor is a phrase that other sites use when linking to your site that has commercial value. In other words, this is a keyword phrase that sites want to rank for because it will result in search engine traffic and revenue. Other anchor types include: brand anchors which include your brand name and noise anchors which are phrases that no one is really trying to rank for such as ‘click here’.

Above we find the following commercial anchors:

  • Bad Credit
  • Free Credit Score
  • Best Credit Cards
  • Free Credit Score
  • Free credit report and score
  • Credit Score
  • Your free credit score
  • Cards for bad credit

You will also want to review commonly used long anchors such as:

  • Free Credit Score & Credit Report Data. No Credit Card Required.

These long anchors may be okay as they often they represent the title tag of a page on your site and will appear on scraper sites outside your control. While these links from scraper sites aren’t typically helpful, we can safely ignore them in most cases. In other situations long anchors may occur unnaturally such as in conjunction with a link building widget or in sitewide text links.

You should also look for patterns in the commercial anchors. For example, another site I recently reviewed has many commercial anchors in the following pattern ‘[city] cheap flights’:

  • Jasper cheap flights
  • Richmond cheap flights
  • Burlington cheap flights
  • Barrie cheap flights

This pattern went on and on – cities big and small around the world. Seems a little fishy don’t you think? Do you think sites would commonly choose to link to a site with the anchor ‘Barrie cheap flights’? Where the heck is Barrie anyway? (Turns out its Canada’s 34th largest city). For a sanity check, I personally love to search for some of these oddities in Google to see if the site is ranking for the abused anchor text.

http://www.google.com/search?pws=0&gl=us&q=barrie+cheap+flights

Turns out this powerful site doesn’t even rank in the first few pages of Google for ‘barrie cheap flights’! Uh oh. Two things spring to mind, either they have overdone things or these links all reside on weak or networked domains.

Step 5: Expand your commercial anchors and start looking at backlinks

Here are some types of links you should keep an eye out for:

  • Links or images on pages marked as sponsored or advertorial
  • Links in articles that look editorial but the article talks about compensation
  • Links in articles that use commercial anchors unless you believe it truly happened naturally
  • Links on pages that have lots of commercial anchor links to other sites
  • Links in forum profile pages (forum spam)
  • Commercial anchors in blog comment sections (comment spam)
  • Links on pages that just don’t feel good… go with your gut
  • Links from link networks
  • Commercial anchors on article directory sites

There are lots of other types of problematic types of links but they are outside the scope of this article. Feel free to mention other link types you look out for in the comments below.

I started by looking at the URLs under the anchor: ‘Bad Credit’:

pivot-table-8

Yikes! Here we have a whole bunch of domains in the form of keyword+city.com that all use the anchor text ‘Bad Credit’ to link to our site. As you may suspect, all 512 domains are hosted on the same IP address and all of the pages look basically the same (i.e. a very lazy link network). The linking pages are setup as directories and link to about 30 different sites using commercial anchors. Below you will see a portion of the page that actually links to two different pages on our example site.

financial-center-9

These links should be flagged for removal for several reasons:

  • Links from a link network or series of mirror sites on different domains
  • Links from a page full of outbound commercial anchor links
  • Links from a directory full of commercial anchors

Moving on in our backlink analysis, you should open the anchor “(blank)”, which is going to contain images or banners linking to your site. Here you will want to visit a good number of linking sites to see how exactly they are linking to your site.

Problems you should be looking out for include:

  • Followed, directly linking banners on sites under Sponsored-type headings (did they ‘sponsor’ the site to buy PageRank? = bad. Or did they simply get a link as a by-product of a large involved sponsorship? = typically okay.)
  • Followed, directly linking images on poor quality, sketchy websites
  • Lots of banner links using commercial alt text

Next, I looked at the anchor ‘Free Credit Score’ which I suspect to find more problematic links. First off, this anchor is capitalized when most natural commercial anchors are typically lowercase (i.e. ‘free credit score’).

Here I quickly found an article about packing and moving with this friendly tip:

credit-21

My guess is this link was unnaturally acquired for since the site’s brand name is not
“Free Credit Score” and therefore the sentence lacks proper grammar. If the tip said either:

  • Use a free credit score to check your credit score.
  • Use Brand.com to check your score.

Then I would not be able to tell if the link was editorially earned or unnaturally placed.

In this anchor section I also found another peculiar pattern of URLs that look like this:

http://www.randomcrappydomain.com/wp-content/uploads/includes/getfreecreditscore.html

These pages steal themes from university websites and then link out to a variety of sites using commercial anchors.

university-20

While I have no clue if our site has anything to do with these links or if spammers created them, they could create problems with Google. Steps should be taken to try and get these links removed. In this case, I found that most of these spammy sites all linked to the same internal page on our client’s site. I then looked at the other links pointing at this internal page and found only spammy commercial anchors pointing to it. Here I would recommend removing this page from the client’s site if possible and returning a 404 Not Found response header.

In the next section I reviewed some links with the anchor: ‘Firstname @ Brand’

blog-comment-22

Here we have someone leaving comments on blogs using the same anchor each time. This can be okay if the person is leaving genuine, thoughtful comments on blogs related to the Brand. I found this to be true sometimes and other times I found this person leaving generic comments or comments on non-related articles. While there is no malicious spam happening here, I would want to set some guidelines with this commenter from the client company about relevancy and thoughtfulness if they are going to continue to leave comments in the future.

From here you can continue exploring the remaining common commercial anchors pointing to a site for signs of unnatural links. If you’ve made it this far, I want to thank you! I hope you have learned a thing or two about using Excel and managing big data. At this point, you have a foundation for trimming down hundreds of thousands of links into a manageable number. Using a simple pivot table, you can further filter links by follow or nofollow and by sitewide or not sitewide.

Once you have tried this system a few times, you will see that everything gets easier and faster. With experience you will constantly learn more about identifying problematic links and keeping your sites out of trouble with Google’s Penguin updates. Remember – don’t be blissful about the links pointing to your sites because preventing Penguin penalties is much more rewarding then recovering from them.

Written By Brandon Millard
Asset 1 Asset 1 Asset 3