In the digital marketing community, crawling, indexing, and ranking are considered more of an art rather than a strict science. This is due to their dependence on search engine algorithms, which are constantly changing.
The first step to getting your website’s pages indexed and served to users in the search engine results pages (SERPs) is crawling. In this article, we’ll address the question of “how often does Google crawl a site?” and dive into what crawling is, how it works, and how to get Google to crawl your site.
What is crawling?
Google constantly looks across the internet for new and updated web pages. When it finds them, it adds the sites to its list of “known” pages. This process is called “URL Discovery.”
Once Google discovers a web page, it may visit or “crawl” the page to see what it contains. It’s important to note that Google crawls per page, and not per entire website as a unit itself.
Google’s program that crawls pages on the internet is called Googlebot, but it’s also known as a Google crawler, robot, bot, or spider. Googlebot is the generic name for Google’s two main web crawlers: “Googlebot Desktop” and “Googlebot Smartphone.” These crawlers simulate user experience either on desktop or mobile devices.
Because most users browse the internet on their mobile devices these days, the majority of websites have Googlebot Smartphone as their primary crawler.
How often does Google crawl a site?
There are no hard and fast rules when it comes to how often Google crawls a website—frequency can vary wildly per domain. Why? Different websites have different crawling requirements.
Server bandwidth and content freshness both impact crawling frequency for a website.
When Google visits a website, it typically crawls as many pages as it can without overwhelming a website’s bandwidth. Google tends to regularly crawl websites with frequently added or updated content because it continuously searches for high-quality, new content.
In a webmaster hangout where the hosts discussed crawl frequency, John Mueller (head of the Google Search Relations team) stated:
“[...] we don’t crawl URLs with the same frequency all the time. So some URLs we will crawl daily. Some URLs maybe weekly. Other URLs every couple of months, maybe even every once half year or so”
As an example, let’s compare a news website and the website for a local landscaping business.
The frequency of Google's crawling plays a crucial role in delivering accurate and up-to-date information to users. While a news website requires regular crawls to stay relevant, a local landscaping business website, with infrequent page updates, may not require the same level of crawling frequency.
How do Google crawlers work?
Like all search engines, Google uses an algorithmic crawling process to determine which sites, how often, and what number of pages from each site to crawl.
Google doesn’t necessarily crawl all the pages it discovers, and the reasons why include the following:
- The page is blocked from crawling (robots.txt)
- The page is behind a login wall
- Server problems (Google is not able to access the site’s content due to network or connectivity errors)
- The page is part of a long chain of redirects
Once Googlebot reaches the first 15mb, it will stop crawling and consider only that amount for indexing. The size limit is based on uncompressed data. This info may be important if your pages are very large.
What happens after Google crawls my site?
A web page goes through three stages before it is presented to users in the SERPs. These stages are crawling, indexing, and website ranking.
Google’s crawler (Googlebot) downloads text, images, and videos it finds on the internet with its automated programs called crawlers.
After downloading the items it crawls, Google analyzes them and stores them in the “Google Index,” which is a large database.
Pages are then served in Google’s search results. These are the links shown to users when they type in a specific query.
Misconceptions about Google crawlers
There are two widespread misconceptions when it comes to website crawling—let’s clear them up.
Google needs to crawl your (entire) website as frequently as possible in order to rank well.
Reality—Google only needs to crawl a website frequently enough to discover new pages and updated pages. This will vary for every website, as webmasters publish content at their own cadence. Google does tend to crawl popular pages more frequently to keep them fresh in the index.
Crawl budget should be a top of mind concern.
Reality—For websites with fewer than several thousand pages, Google’s crawl budget and crawl rate do not need to be a concern, as Google can generally crawl efficiently at this level.
Webmasters should be more concerned about having too many low value URLs (like duplicate pages, soft error pages, hacked pages, low quality and spam content) as that can negatively affect crawling.
Why? If Google crawls too many of these low value pages, it may assume the rest of the website contains low value pages, and crawl less frequently.
How to know if Google crawled my site
Using Google Search Console (formerly Webmaster Tools) to review your website's performance allows you to understand the crawling patterns of your pages and identify potential issues that hinder their crawlability.
Let’s explore the key areas to look into when reviewing.
Index Coverage Report
- To make sure Google can find and read your pages
- Will tell you what pages are indexed or not indexed (and why)
URL Inspection Tool
- See current index status of a page
- See when it was last crawled
Google Crawl Stats Report
- Data covering the last 90 days
- Displays daily call requests
- Crawl request breakdown (by purpose, response, file type, Googlebot type)
How to get Google to crawl your site
While there is no way to force Google to crawl your site, there are several ways to encourage a crawl of your website’s pages—both manually and via search engine optimization (SEO).
1. Manually request a Google crawl or recrawl
Using the URL inspection tool, webmasters can manually request that Google adds a specific URL to its crawling queue to either be crawled for the first time, or a recrawl.
This does not guarantee crawling or indexing, but it can get the URL on Google’s radar. After you request a Google crawl, It may take up to a few days or weeks for Google to address it. This solution is best suited for just a few URLs.
2. Submit a sitemap
Submitting an XML sitemap through Google Search Console and including a link to it in your website’s robots.txt file is the best way to “tell” Google about many URLs on your website.
A single sitemap can hold up to 50,000 URLS, and fortunately is usually automatically generated by most CMS platforms (e.g. Wordpress, Wix or Blogger).
3. Backlinks and internal links
Backlinks are an external way to help Google discover a new site, validate the importance of web pages, and boost the domain authority of an entire website. As high-quality backlinks are not entirely in your control and can take a lot of time to build, a more straightforward approach is to start with internal linking.
Internal linking helps Google make sense of your site structure and find new pages to crawl. There is no magic number of how many on-page internal links one page should include, but a good rule of thumb is that every page on your website (that you do want crawled and indexed) should have at least 1 link pointing to it from another page on your website.
Regularly monitoring the crawling and indexing of a website’s pages is simply not a one time task. It’s merely one portion of a long-term, ever-evolving, comprehensive SEO strategy.
If you need help managing your website's SEO strategy, our experienced technical SEO team at Ayima is here to assist you.