Ayima Offices

Conquering Pagination – A Guide to Consolidating your Content

A topic sure to make any SEO neophyte’s head spin, approaching and handling pagination can seem a daunting prospect at first. Pagination is a wily shapeshifter, rearing its ugly head in contexts ranging from e-commerce, to newspapers, to forums. Bottom line is, if you’re in the business of on-page optimization, it’s not a question of if you’ll have to deal with pagination problems – it’s a question of when. Luckily, we’re here to give you some advice to get you started, and answer some of the more thought-provoking questions that can arise in tricky situations.

Pagination Example

So what exactly is pagination, you ask? In a very basic sense, pagination occurs when a website segments content over the span of multiple pages. On an e-commerce site, this will take the form of product and category listings. On a news site, articles may be divided up across multiple pages or arranged in the form of a slideshow. On forums, groups and topic threads will typically span at least 2-3 pages. Even blogs, which tend to feature article previews ordered from latest to oldest, will run into pagination issues on their homepage.

“Big deal”, you may say. “I see this happening all over the place, so what is the problem with paginated content?” From an SEO perspective, pagination can cause serious issues with Google’s ability to index your site’s content. Let’s explore a few of the potential issues that arise when you paginate your content without taking the proper precautions:

  • Crawler Limitations

    When Googlebot is crawling your site, the depth (or levels of clicks deeper into the content) it travels will vary depending on the site’s authority and other factors. If you have a tremendous amount of paginated content, the odds that Googlebot will travel through all paginated content to reach and index the final pages decreases significantly.

  • Duplicate Problems

    Depending on the context of the pagination, it is very likely that some elements across the series of pages may contain similar or identical content. In addition to this, you’ll often find that identical title tags and meta descriptions tend to propagate across a span of paginated content. Duplicate content can cause massive confusion for Googlebot when it comes time to determine which pages to return for search queries.

  • Thin Content

    In situations (such as the aforementioned news sites) where articles or product reviews tend to be segmented into multiple pages, you run the risk of not providing enough original content for the individual pages to be indexed separately. More importantly, this also creates the risk of running too low on content-to-advertisement ratios, which can set your site up for devastating Panda penalties further down the road.

So how do you deal with Pagination?

Your best option is always optimal site design. There are a number of ways that these problems can be prevented before they begin. When planning the design of an ecommerce or similar site, consider the following measures you can take to cut down on large-scale pagination issues:

  1. Increasing the number of categories, which will decrease the depth of each paginated series
  2. Increasing the number of products per page, which will decrease the number of total pages in the paginated series
  3. Linking to all pages within the now manageable paginated series from the first page, which will alleviate any crawl-depth and link authority flow problems

However, in many real world scenarios, the damage has already been done and a site structure overhaul is not an option. Luckily, Google has given us a variety of methods to better steer the crawlers through our deep crypts of paginated content. As an SEO, you have three weapons in your arsenal to preemptively deal with any problems that may arise out of pagination:

Option 1: Remove your paginated content from the index

There are many situations where simply taking the paginated content off the table is the best solution. If there are no particular advantages to having this content indexed and searchable, then the easiest solution is to implement a <META NAME="ROBOTS" CONTENT="NOINDEX, FOLLOW"> tag within the <head> section of every page in the paginated series, excluding the first page. You’ll want to make sure to include the “FOLLOW” tag here if this is a listing series of any kind – this will ensure that page authority will travel into the individual destination pages throughout the list, despite the list itself being precluded from Google’s index. Including the “FOLLOW” tag may also help some link authority that is arriving at pages within the paginated series to travel back to the indexed first page and the rest of the site.

noindex

Advantages Disadvantages
The least complex of all solutions. While it does solve potential pagination problems, it also eliminates the paginated content from Google’s index.
Great for situations in which there is no logical reason to index the paginated content.


Option 2: View-All Page and rel=“canonical”

Google’s preferred first choice for handling most pagination issues is to create a separate “View-All” page apart from the paginated series and include all of the items within this single page. Once you’ve created the View-All page, you can then place a rel="canonical" tag within the <head> section of each paginated component page, pointing to the View-All Page. (e.g. <link rel="canonical" href="http://www.site.com/view-all-page"/>). This will essentially tell Google to treat each specific page in a paginated series as a segment of the View-All page and queries will return the View-All page as opposed to a relevant segment page of the pagination chain.

view-all

Google states that this is their preferred method of guiding Googlebot through paginated content, and that users typically prefer a view-all page. Whether users actually prefer a view-all page is debatable and certainly depends on the context of each situation. There is one large caveat to this method – the View-All page has to be manageable enough to load within a “reasonable amount of time”, which is generally regarded as 2-4 seconds. This makes it a great option for consolidating text-only product and category listings that exist within 5-20 pages of paginated content. Conversely, it makes this a poor choice for consolidating paginated articles with many images and product or category listings with hundreds of pages.

Advantages Disadvantages
Relatively simple implementation Not a solution for massive or image heavy series of paginated content
Google’s first choice solution Some businesses may be unwilling or unable to implement a View-All page for product listings
All content within the pagination sequence will be represented on the search engine via the View-All page
Can present a more user-friendly navigation method.


Option 3: Rel=“prev”/“next”

Our final option for dealing with pagination problems may be the most complicated, but it is arguably the most versatile. Google now recognizes the rel=“prev” and “next” HTML attributes as a method of indicating a sequence of paginated pages. The implementation can be tricky, and you have to be exceptionally careful when applying this method. Let’s take a look at how this works.

You have four pages of paginated content:

next-prev

By using rel="prev"/"next", you’re essentially creating a chain between all sites in the pagination series. You’ll begin the chain with Page 1, adding the following code to the <head> section of the page’s HTML:

(Page 1):

<link rel="next" href="http://www.site.com/page2.html">

That’s the only step we have to take for the beginning of the chain. Now we move on to Page 2. Consider that Page 2 is now in the middle of the chain, so we have to attach it both to the page before it, and to the next page in the sequence. Page 2 would have the following code in the <head>:

(Page 2):

<link rel="prev" href="http://www.site.com/page1.html">

<link rel="next" href="http://www.site.com/page3.html">

Now just as you might have assumed, since Page 3 is also in the center of this sequence of linked pages, we have to continue to implement the code in a similar manner:

(Page 3):

<link rel="prev" href="http://www.site.com/page2.html">

<link rel="next" href="http://www.site.com/page4.html">

And so we’ve reached Page 4, the last in our chain of paginated content. The last page should only contain a rel="prev" attribute in the <head>, as there are no further pages within the sequence:

(Page 4):

<link rel="prev" href=" http://www.site.com/page3.html">

Using this complete sequence of rel="prev"/"next", Google is able to consolidate this group of paginated content into a single entry in their index. This essentially tells Google to treat the sequence of paginated content as one entry within their index. Typically, the first page will be returned to the user as it is usually the most relevant to a query regarding the paginated series. However, Google has noted there as scenarios where a more relevant page within the sequence is returned if the query is particularly centered around the content on that page.

Advantages Disadvantages
Unparalleled flexibility Implementation can be complicated
Allows resolving pagination issues without use of a View-All page Requires proper execution of the chain in order to be effective
Can be executed properly with only minor changes to HTML

An important thing to note with rel=”prev”/”next” implementations is that they can be used alongside canonical tags. While this will become particularly useful in the advanced concepts section, it is worth noting that if you’re in the practice of using self-referential canonical tags, they will function the same way within a rel=”prev”/”next” chain.

Advanced Pagination Concepts

Now that we’ve tackled the basics, its time to take a look at some of the more interesting questions and scenarios you’ll run into once you start getting comfortable with pagination.

Setting a Benchmark

If you have access to your server logs, it’s fairly simple to determine the success with which Googlebot is currently crawling your unadjusted paginated content. Before any changes are implemented, we recommend choosing a few paginated series within your site and determining how many pages deep into the series Googlebot is crawling. Once you’ve determined this you can then perform search queries to investigate how many of these pages Google is choosing to include in the index.

This will give you a starting point benchmark that will enable you to determine the success of your efforts. After you have implemented your changes, you can revisit the server logs upon Googlebot’s return to determine whether crawl-depth and indexation rates have improved.

AJAX and Javascript scroll setups

You’ve likely ran into infinite scroll setups on ecommerce sites in which the content will continuously load as you scroll towards the bottom of the screen. While this is a nice feature to improve the user experience, AJAX and Javascript reliant navigation functions should always be implemented using Progressive Enhancement.

Ensuring that the site will function properly for users that have Javascript disabled is not only considerate to your users, but it also allows you to implement the pagination solutions discussed in this guide beneath the enhanced user experience features. This will enable Googlebot to properly crawl and index your content while you provide advanced Javascript navigation features for your visitors.

Relevancy Signals: View-All Pages vs. rel=“prev”/“next”

You may find yourself in the fortunate position to be able to choose whether to implement a View-All page or rel="prev"/"next". Although we do have indicators from Google suggesting that View-All is the preferred method for handling these pagination issues, there are certain contexts in which a rel="prev"/"next" implementation could prove more beneficial, as far as relevancy signals are concerned.

Consider for a moment that Google has stated that both View-All Page canonicalization and rel="prev"/"next" sequences consolidate all incoming link authority into the pages that ultimately will rank for queries related to them. The View-All page will naturally consolidate this link authority via the canonical tags pointed towards it and the ranking page in the sequence of rel="prev"/"next" will inherit the link authority via the properties Google uses to link the component pages together in the index.

Now that we’ve established link authority will be similar in both methods, we’re left with one very interesting question: What about the other relevancy signals that affect the page’s ability to rank? What happens to the unique URLs, the title tags, the meta descriptions, the H1/H2s and other factors? We know that the canonicalization that occurs when using the View-All method will effectively register these factors moot – Google knows to look to the canonical page for these items.

But if a series of pages linked together via rel="prev"/"next" contain unique title tags and URLs, and any one of these pages has the opportunity to rank for a query based on them, then they could potentially retain these relevancy signals as opposed to having them washed away via canonicalization.

Clearly, this is not a consideration for a simple paginated product or category listing with similar content across the series of pages. There is no unique relevancy factor to be found with “page1.htm” vs. “page2.htm”, and no ranking advantages to “Dresses Page 1” as opposed to “Dresses Page 2”. But what about a situation like the one below?

sport-football-tenis-hockey

The truth is, no one really knows exactly how Google treats the rel="prev"/"next" sequence within the index. However if we do know that in at least some cases, pages further into the sequence than the first page will be returned in the SERPs, it’s safe to assume that the URL, title tag, and other factors will still play some role in determining relevancy to any given query.

Parameters and rel=“prev”/“next”

In some cases when dealing with rel="prev"/"next", your paginated URLs will contain parameters that do not change the content of the page, such as unique session ID’s. An experienced SEO will tell you these are bad news – if you don’t give Google specific instructions on how to deal with these situations you may wind up with duplicate content problems.

You always have the option of just telling Googlebot to not crawl certain URLs using “URL Parameters” in Webmaster Tools, but what if you’d like to preserve link authority that is coming in to these parameterized URLs? We can make that happen, using rel="prev"/"next" in conjunction with canonical tags.

First, you have to make sure that all pages within a paginated rel="prev"/"next" sequence are using the same parameter. Second, each parameterized URL can also canonicalize to the non-parameterized version of the URL. For example, we’ve got the same 4 pages of paginated content, but this time the user is being tracked via session ID 55:

complex

Filtered Content and rel=“prev”/“next”

Now let’s say you’re working with parameters that filter the content within a paginated series. For example, say we’ve got a parameter on a paginated set of product listing URLs that filter via brand, such as:

Page 1: http://www.site.com/page1.html?brand=nike

In this situation, the content on each page will depend on this variable. For example:

Page 1: http://www.site.com/page1.html?brand=adidas

Page 2: http://www.site.com/page2.html?brand=adidas

Will be returning a completely different set of products than:

Page 1: http://www.site.com/page1.html?brand=reebok

Page 2: http://www.site.com/page2.html?brand=reebok

If you believe there is value in having each filtered product type in Google’s index, your best plan of action is to create separate paginated sequences for each brand filter. You won’t be using canonical tags in this situation, since the content will be unique depending on the parameter. Here’s an example of how to handle this scenario:

complex-and1

Sorted Content and rel=“prev”/“next”

The last type of parameterized URL type we’re going to look at is sorted content. You’re more likely to find this type of parameter in a forum or blog type setting, though it will exist frequently on ecommerce sites as well. For example:

When you first arrive at the page, the URL might read:

Page 1: http://www.news-site.com/page1.html?order=oldest

But there may be an option to view the newest items first, resulting in this URL:

Page 1: http://www.news-site.com/page1.html?order=newest

There’s currently a fair amount of debate in the SEO community as to how to treat this type of situation. Though some would suggest attempting a separate rel=”prev”/”next” sequence for both “newest” and “oldest” sort method URLs, in our opinion this would essentially be indicating to Google that you would like them to index multiple paginated sequences of identical content. The only difference between these two paginated groups would be that the content is displayed in a different order, still putting you in dangerous territory for duplicate content.

Ayima recommends taking the safe route on this, and presenting only one sorted paginated sequence to Google for indexing. The default sort method should carry a rel="prev"/"next" pagination method:

blocked

The alternate sorting method, in this case newest, should be blocked from indexation. This is most quickly accomplished using URL Parameters in Webmaster Tools, specifying the parameter and allowing Googlebot to crawl only the default value.

screen

These solutions may seem complicated at first, but they are easily manageable if you address each instance of pagination separately and apply the proper rule for each scenario. It may be helpful to consult this flow chart provided in order to simplify the decision making process:

complex-and

We’ve seen many situations in which rel="prev"/"next" are implemented incorrectly, so be sure to double-check your chains upon completion. Dealing with these problems can be painful, but with careful planning and thorough implementation you’ll be successfully guiding Google through your site before pagination has a chance to ruin your day.

Ryan Huser

Ryan Huser

Both the technical side and aesthetics of web design have always come naturally to Ryan, which probably has something to do with being the son of a United States Air Force communications specialist and an interior design...

Find out more More posts by Ryan

Showing 17 comments

Avatar

Michael Cottam

Super job on this, Ryan!

Thinking about next/prev vs. view all, isn’t it true that a view all page is likely to accumulate much more link juice/page authority than any one of a set of next/prev pages, since you’re rel=canonicalling the link juice from the entire set of pages to the view all page?

It seems to me that in most cases, it would be more beneficial to have a stronger view all page than to get each of the sequence of pages in the index.

Avatar

Ricky Shah

How about combination of them? I usually prefere noindexing page 2, page 3 etc while keeping rel=prev and rel=next tags intact. Does it sound like a good idea? The general idea is to keep the content of first page more relevant and emphasizing while mentioning that page 2, page 3 are just supported pages which need to be get indexed.

Avatar

Ryan Huser

Thanks for reading guys.

Michael, we’re actually told specifically by Google in the Webmaster Central post I link to within the article that when we use rel=prev/next we’re suggesting that Google: “consolidate indexing properties, such as links, from the component pages/URLs to the series as a whole (i.e., links should not remain dispersed between page-1.html, page-2.html, etc., but be grouped with the sequence).” To me, this sounds like a clear indication that the link authority the canonical tags consolidated within the View All page would be equal to the total link authority assigned to the complete rel=next/prev chain. That being said, Google’s recommendation to use View All pages whenever viable is still probably your best bet.

Ricky, that’s an interesting suggestion and is certainly worth some testing. Based on what Maile says on the subject here: http://googlewebmastercentral.blogspot.com/2012/03/video-about-pagination-with-relnext-and.html, it would appear that the chain can indeed be completed while noindexing certain pages within it. Maile also notes that the majority of the time the first page is returned for the query, so I would think this is only worth doing when you absolutely do not want specific pages within the series to be indexed.

Avatar

Sean

Great post man,

I’ve learned that using URL Parameters in Google Webmaster Tools has sufficed perfectly well for most sites I have worked on. Do you think it is a better option to have the solution hardcoded than to hope that Google’s paginated settings are taken into account?

Avatar

Ari

Incredible post, Ryan. Can you elaborate a bit on Progressive Enhancement? Ajax endless scrolling has always stumped me when it came to pagination / indexation of deeper products on a view-all setup for an ecommerce site whose categories are quite large.

Any resources or best practices you can link to?

Avatar

Jamie Knop

Great read Ryan. Really explains things in an easy to follow manner. I’ve not come to use rel=“prev” and “next” yet but I can see it will be useful in certain instances. The flow chart is really helpful also, best read up I have come across on pagination by far.

Avatar

Serbay Arda Ayzit

This is great post about pagination. Congrats. I was thinking to create presentation about pagination in e-commerce. May i use some images in your article?

Avatar

Terry Van Horne

A lot of this should be picked up by developers, especially where blogs are concerned. A lot of pagination can be eliminated by just using excerpts in tag and category pages. Which is as simple as choosing the right setting for feeds…always use summary. I also try to mix up how I pull data on category and brand pages using order and other query variations to make pages as unique as possible. In the end the canonical and pagination smoothing tags can become a hassle to manage especially in a multi-user environment.

Avatar

Jelle

Fantastic post. This couldn’t have come at a better time. We are currently reworking our website which is centered around news and has been around for over 8 years. With an average of about 25-30 posts a day you can see what a nightmare this is for pagination issues. I’ve been really trying to come to grips with how to handle this and this post is pretty much handing me the possible solutions on a platter!

Avatar

Jeremy Wallace

Good post. I have been contemplating a way to deal with this exact issue on a website I manage. I am going to try out the steps you recommended and see if it works. More than likely I will go with the view all page.

Avatar

Félix Blanco

There is another disadvantage with the rel=“prev”/“next” solution: mozilla link prefetching https://developer.mozilla.org/en-US/docs/Link_prefetching_FAQ It makes a request to the next page and in some contexts, this behavior can be negative

Avatar

Kathy Alice Brown

Wonderful post. I’ve mostly used rel=’prev’/'next’ as for some dev shops it is easier to implement than the ‘view all” solution. Great tip on handling parameters within the rel=’prev’/'next’ framework. Here’s a question: you mentioned that one could use GWMT URL parameter configuration to control indexation. However what about over indexation scenarios (where you didn’t have the luxury to set it up correctly initially). Have you found that configuring GWMT URL parameters will drop the URLs with the parameter out of the index? I have run into cases (especially when all we want is the base URL indexed) where it seems to have no effect.

Avatar

Dan Shure

Loving the flowchart Ryan!!

I’m just curious about parameters that filter (faceted nav) AND sort in the URL at the same time… ie:

/product?facet=green&orderby=newest

Where any facet (or multiple facets) can be applied to different sorting options?

It would seem to me that you would BOTH apply rel = prev/next when filters are present AND knock out the sorting parameters?

-Dan

Avatar

Ryan Huser

Félix: That bit about prefetching is fascinating, I had never heard about that before. I’m having trouble determining any large scale problems that prefetching would cause. Did you have any examples of the negative outcomes that would result from the rel=next being read by Firefox as an command to prefetch the page? I’d be curious to hear about the potential contexts where this should be avoided.

Kathy: It can take awhile for Google to re-crawl the parameterized content, but I believe in all situations where we’ve used parameter handling on client’s sites the URLs are eventually removed from the index. However, Google does tend to note that parameter handling is a “suggestion” (similar to how they speak of rel=canonical), so if you’re in a hurry to get these parameters de-indexed, you can always fall back on blocking them via a noindex metatag or robots.txt, coupled with submitting the URLs for removal in WMT.

Dan: Thanks, glad the flowchart is proving useful. The way to handle that filter&sort situation will definitely depend on the context, but I would usually choose to handle it the way you described. The idea is to essentially pretend that only one version of the order/sort variable exists in every situation, and knock out the rest. Then you can proceed to use a rel=prev/next chain for each separate facet type – it’s basically like combining the sort and filter plans. Was that what you were asking, essentially?

Avatar

Fahad Bin Zafar

Ryan this post is most detailed publishing regarding Pagination. but i would like if i can get some more expertise from you regarding another pagination scenario. i.e.
1. suppose i have 10 pages that contain unique questions of particular topic, i want to index all of them in search engine
2. my idea is if someone land on any of my indexed page out of those 10, will be a starting point and rest 9 will be sorted accordingly. e.g. visitor lands on page 6, now it becomes 1st and rest will be sort out. (will be using some script to achieve this)
3. i want to implement pagination for this scenario, so what can i do? will implementing rel-next, rel-prev dynamically be ok for me? the rel-next, rel-prev , and starting and ending page will change accordingly as per sorting order.

Avatar

Ryan Huser

Fahad, that sounds like a really interesting setup. Unfortunately at this time Google has made it relatively clear that the chain has to be in a defined order starting with the first page beginning the chain, and the last page ending it. Creating a rel=prev/next chain for each foreseeable ordering is not advisable either, as this would just result in many groups of duplicate pagination chains.

If you want to go this route you’ll have to pick a single ordering for your rel=prev/next chain, and trust Google to deliver the most relevant results based on the query. At this point you could then conceivably dynamically link in what ever order you wanted in – just leave the pagination chain in one standard formation. Just keep in mind that in the majority of situations, the first page of the series is the one most likely to end up on the SERPs.

Here’s a better plan: why not follow the site design advice at the beginning of the article? You could just link to all ten pages on every component page, provide unique and sufficient content for each page, and you wouldn’t even have to worry about implementing a pagination solution. Each of the pages could stand strong on their own, and it would allow you to target them each specifically to different queries.

Avatar

Fahad Bin Zafar

Ryan thank you so much.. your reply really helped me in making my decision. i am planning to give a try to standard formation. Hoping that Google will not mind. I wish that Google show all pages in SERPs. (A Wish)