Conquering Pagination – A Guide to Consolidating your Content
A topic sure to make any SEO neophyte’s head spin, approaching and handling pagination can seem a daunting prospect at first. Pagination is a wily shapeshifter, rearing its ugly head in contexts ranging from e-commerce, to newspapers, to forums. Bottom line is, if you’re in the business of on-page optimization, it’s not a question of if you’ll have to deal with pagination problems – it’s a question of when. Luckily, we’re here to give you some advice to get you started, and answer some of the more thought-provoking questions that can arise in tricky situations.
So what exactly is pagination, you ask? In a very basic sense, pagination occurs when a website segments content over the span of multiple pages. On an e-commerce site, this will take the form of product and category listings. On a news site, articles may be divided up across multiple pages or arranged in the form of a slideshow. On forums, groups and topic threads will typically span at least 2-3 pages. Even blogs, which tend to feature article previews ordered from latest to oldest, will run into pagination issues on their homepage.
“Big deal”, you may say. “I see this happening all over the place, so what is the problem with paginated content?” From an SEO perspective, pagination can cause serious issues with Google’s ability to index your site’s content. Let’s explore a few of the potential issues that arise when you paginate your content without taking the proper precautions:
- Crawler Limitations
When Googlebot is crawling your site, the depth (or levels of clicks deeper into the content) it travels will vary depending on the site’s authority and other factors. If you have a tremendous amount of paginated content, the odds that Googlebot will travel through all paginated content to reach and index the final pages decreases significantly.
- Duplicate Problems
Depending on the context of the pagination, it is very likely that some elements across the series of pages may contain similar or identical content. In addition to this, you’ll often find that identical title tags and meta descriptions tend to propagate across a span of paginated content. Duplicate content can cause massive confusion for Googlebot when it comes time to determine which pages to return for search queries.
- Thin Content
In situations (such as the aforementioned news sites) where articles or product reviews tend to be segmented into multiple pages, you run the risk of not providing enough original content for the individual pages to be indexed separately. More importantly, this also creates the risk of running too low on content-to-advertisement ratios, which can set your site up for devastating Panda penalties further down the road.
So how do you deal with Pagination?
Your best option is always optimal site design. There are a number of ways that these problems can be prevented before they begin. When planning the design of an ecommerce or similar site, consider the following measures you can take to cut down on large-scale pagination issues:
- Increasing the number of categories, which will decrease the depth of each paginated series
- Increasing the number of products per page, which will decrease the number of total pages in the paginated series
- Linking to all pages within the now manageable paginated series from the first page, which will alleviate any crawl-depth and link authority flow problems
However, in many real world scenarios, the damage has already been done and a site structure overhaul is not an option. Luckily, Google has given us a variety of methods to better steer the crawlers through our deep crypts of paginated content. As an SEO, you have three weapons in your arsenal to preemptively deal with any problems that may arise out of pagination:
Option 1: Remove your paginated content from the index
There are many situations where simply taking the paginated content off the table is the best solution. If there are no particular advantages to having this content indexed and searchable, then the easiest solution is to implement a
<META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> tag within the
<head> section of every page in the paginated series, excluding the first page. You’ll want to make sure to include the “
FOLLOW” tag here if this is a listing series of any kind – this will ensure that page authority will travel into the individual destination pages throughout the list, despite the list itself being precluded from Google’s index. Including the “
FOLLOW” tag may also help some link authority that is arriving at pages within the paginated series to travel back to the indexed first page and the rest of the site.
|The least complex of all solutions.||While it does solve potential pagination problems, it also eliminates the paginated content from Google’s index.|
|Great for situations in which there is no logical reason to index the paginated content.|
Option 2: View-All Page and rel=“canonical”
Google’s preferred first choice for handling most pagination issues is to create a separate “View-All” page apart from the paginated series and include all of the items within this single page. Once you’ve created the View-All page, you can then place a
rel="canonical" tag within the
<head> section of each paginated component page, pointing to the View-All Page. (e.g.
<link rel="canonical" href="http://www.site.com/view-all-page"/>). This will essentially tell Google to treat each specific page in a paginated series as a segment of the View-All page and queries will return the View-All page as opposed to a relevant segment page of the pagination chain.
Google states that this is their preferred method of guiding Googlebot through paginated content, and that users typically prefer a view-all page. Whether users actually prefer a view-all page is debatable and certainly depends on the context of each situation. There is one large caveat to this method – the View-All page has to be manageable enough to load within a “reasonable amount of time”, which is generally regarded as 2-4 seconds. This makes it a great option for consolidating text-only product and category listings that exist within 5-20 pages of paginated content. Conversely, it makes this a poor choice for consolidating paginated articles with many images and product or category listings with hundreds of pages.
|Relatively simple implementation||Not a solution for massive or image heavy series of paginated content|
|Google’s first choice solution||Some businesses may be unwilling or unable to implement a View-All page for product listings|
|All content within the pagination sequence will be represented on the search engine via the View-All page|
|Can present a more user-friendly navigation method.|
Option 3: Rel=“prev”/“next”
Our final option for dealing with pagination problems may be the most complicated, but it is arguably the most versatile. Google now recognizes the rel=“prev” and “next” HTML attributes as a method of indicating a sequence of paginated pages. The implementation can be tricky, and you have to be exceptionally careful when applying this method. Let’s take a look at how this works.
You have four pages of paginated content:
rel="prev"/"next", you’re essentially creating a chain between all sites in the pagination series. You’ll begin the chain with Page 1, adding the following code to the
<head> section of the page’s HTML:
<link rel="next" href="http://www.site.com/page2.html">
That’s the only step we have to take for the beginning of the chain. Now we move on to Page 2. Consider that Page 2 is now in the middle of the chain, so we have to attach it both to the page before it, and to the next page in the sequence. Page 2 would have the following code in the <head>:
<link rel="prev" href="http://www.site.com/page1.html">
<link rel="next" href="http://www.site.com/page3.html">
Now just as you might have assumed, since Page 3 is also in the center of this sequence of linked pages, we have to continue to implement the code in a similar manner:
<link rel="prev" href="http://www.site.com/page2.html">
<link rel="next" href="http://www.site.com/page4.html">
And so we’ve reached Page 4, the last in our chain of paginated content. The last page should only contain a
rel="prev"attribute in the
<head>, as there are no further pages within the sequence:
<link rel="prev" href=" http://www.site.com/page3.html">
Using this complete sequence of
rel="prev"/"next", Google is able to consolidate this group of paginated content into a single entry in their index. This essentially tells Google to treat the sequence of paginated content as one entry within their index. Typically, the first page will be returned to the user as it is usually the most relevant to a query regarding the paginated series. However, Google has noted there as scenarios where a more relevant page within the sequence is returned if the query is particularly centered around the content on that page.
|Unparalleled flexibility||Implementation can be complicated|
|Allows resolving pagination issues without use of a View-All page||Requires proper execution of the chain in order to be effective|
|Can be executed properly with only minor changes to HTML|
An important thing to note with rel=”prev”/”next” implementations is that they can be used alongside canonical tags. While this will become particularly useful in the advanced concepts section, it is worth noting that if you’re in the practice of using self-referential canonical tags, they will function the same way within a rel=”prev”/”next” chain.
Advanced Pagination Concepts
Now that we’ve tackled the basics, its time to take a look at some of the more interesting questions and scenarios you’ll run into once you start getting comfortable with pagination.
Setting a Benchmark
If you have access to your server logs, it’s fairly simple to determine the success with which Googlebot is currently crawling your unadjusted paginated content. Before any changes are implemented, we recommend choosing a few paginated series within your site and determining how many pages deep into the series Googlebot is crawling. Once you’ve determined this you can then perform search queries to investigate how many of these pages Google is choosing to include in the index.
This will give you a starting point benchmark that will enable you to determine the success of your efforts. After you have implemented your changes, you can revisit the server logs upon Googlebot’s return to determine whether crawl-depth and indexation rates have improved.
Relevancy Signals: View-All Pages vs. rel=“prev”/“next”
You may find yourself in the fortunate position to be able to choose whether to implement a View-All page or
rel="prev"/"next". Although we do have indicators from Google suggesting that View-All is the preferred method for handling these pagination issues, there are certain contexts in which a
rel="prev"/"next" implementation could prove more beneficial, as far as relevancy signals are concerned.
Consider for a moment that Google has stated that both View-All Page canonicalization and
rel="prev"/"next" sequences consolidate all incoming link authority into the pages that ultimately will rank for queries related to them. The View-All page will naturally consolidate this link authority via the canonical tags pointed towards it and the ranking page in the sequence of
rel="prev"/"next" will inherit the link authority via the properties Google uses to link the component pages together in the index.
Now that we’ve established link authority will be similar in both methods, we’re left with one very interesting question: What about the other relevancy signals that affect the page’s ability to rank? What happens to the unique URLs, the title tags, the meta descriptions, the H1/H2s and other factors? We know that the canonicalization that occurs when using the View-All method will effectively register these factors moot – Google knows to look to the canonical page for these items.
But if a series of pages linked together via
rel="prev"/"next" contain unique title tags and URLs, and any one of these pages has the opportunity to rank for a query based on them, then they could potentially retain these relevancy signals as opposed to having them washed away via canonicalization.
Clearly, this is not a consideration for a simple paginated product or category listing with similar content across the series of pages. There is no unique relevancy factor to be found with “page1.htm” vs. “page2.htm”, and no ranking advantages to “Dresses Page 1” as opposed to “Dresses Page 2”. But what about a situation like the one below?
The truth is, no one really knows exactly how Google treats the
rel="prev"/"next" sequence within the index. However if we do know that in at least some cases, pages further into the sequence than the first page will be returned in the SERPs, it’s safe to assume that the URL, title tag, and other factors will still play some role in determining relevancy to any given query.
Parameters and rel=“prev”/“next”
In some cases when dealing with
rel="prev"/"next", your paginated URLs will contain parameters that do not change the content of the page, such as unique session ID’s. An experienced SEO will tell you these are bad news – if you don’t give Google specific instructions on how to deal with these situations you may wind up with duplicate content problems.
You always have the option of just telling Googlebot to not crawl certain URLs using “URL Parameters” in Webmaster Tools, but what if you’d like to preserve link authority that is coming in to these parameterized URLs? We can make that happen, using
rel="prev"/"next" in conjunction with canonical tags.
First, you have to make sure that all pages within a paginated
rel="prev"/"next" sequence are using the same parameter. Second, each parameterized URL can also canonicalize to the non-parameterized version of the URL. For example, we’ve got the same 4 pages of paginated content, but this time the user is being tracked via session ID 55:
Filtered Content and rel=“prev”/“next”
Now let’s say you’re working with parameters that filter the content within a paginated series. For example, say we’ve got a parameter on a paginated set of product listing URLs that filter via brand, such as:
Page 1: http://www.site.com/page1.html?brand=nike
In this situation, the content on each page will depend on this variable. For example:
Page 1: http://www.site.com/page1.html?brand=adidas
Page 2: http://www.site.com/page2.html?brand=adidas
Will be returning a completely different set of products than:
Page 1: http://www.site.com/page1.html?brand=reebok
Page 2: http://www.site.com/page2.html?brand=reebok
If you believe there is value in having each filtered product type in Google’s index, your best plan of action is to create separate paginated sequences for each brand filter. You won’t be using canonical tags in this situation, since the content will be unique depending on the parameter. Here’s an example of how to handle this scenario:
Sorted Content and rel=“prev”/“next”
The last type of parameterized URL type we’re going to look at is sorted content. You’re more likely to find this type of parameter in a forum or blog type setting, though it will exist frequently on ecommerce sites as well. For example:
When you first arrive at the page, the URL might read:
Page 1: http://www.news-site.com/page1.html?order=oldest
But there may be an option to view the newest items first, resulting in this URL:
Page 1: http://www.news-site.com/page1.html?order=newest
There’s currently a fair amount of debate in the SEO community as to how to treat this type of situation. Though some would suggest attempting a separate rel=”prev”/”next” sequence for both “newest” and “oldest” sort method URLs, in our opinion this would essentially be indicating to Google that you would like them to index multiple paginated sequences of identical content. The only difference between these two paginated groups would be that the content is displayed in a different order, still putting you in dangerous territory for duplicate content.
Ayima recommends taking the safe route on this, and presenting only one sorted paginated sequence to Google for indexing. The default sort method should carry a
rel="prev"/"next" pagination method:
The alternate sorting method, in this case newest, should be blocked from indexation. This is most quickly accomplished using URL Parameters in Webmaster Tools, specifying the parameter and allowing Googlebot to crawl only the default value.
These solutions may seem complicated at first, but they are easily manageable if you address each instance of pagination separately and apply the proper rule for each scenario. It may be helpful to consult this flow chart provided in order to simplify the decision making process:
We’ve seen many situations in which
rel="prev"/"next" are implemented incorrectly, so be sure to double-check your chains upon completion. Dealing with these problems can be painful, but with careful planning and thorough implementation you’ll be successfully guiding Google through your site before pagination has a chance to ruin your day.