The robots.txt file has always been the handy little text file that allows SEOs and devs to control which URLs crawlers can and cannot access.
On July 1st, Google announced their intention to make the REP (Robots Exclusion Protocol) an internet standard. As part of this effort, Google open-sourced it's own robots.txt parser code, making it's C++ library used for parsing and matching rules in robots.txt files available to all.
As part of our efforts to make useful tools for the SEO community, our SEO and development teams have collaborated in building a robots.txt testing tool based on Google's own parsing engine. We're very excited to share this tool with the SEO community, here are some of it's key features:
- Based on Google's own parser code
- Bulk check upto 100k URLs
- Use your own custom robots.txt rules
- Export to CSV
- Reports the line number that applies to the URL
- Reports the applied rule
Validating your robots.txt tool is easy - simply paste up to 100,000 URLs or select "Upload CSV", select a user-agent and the tool will reveal whether the URL is allowed or disallowed by the rules in the robots.txt. If you would like to test against different rules, select the "Custom robots.txt rules" switch and paste in your own directives. Exporting to CSV provides additional information, namely the line number in the robots.txt and the rule that is being applied.
In this example, we’re going to copy and paste a handful of URLs from bbc.co.uk, show you where to select the user-agent, where to paste in your custom rules, and then click on a disallowed URL in the results to show you the rule being applied:
To upload URLs via CSV, switch to “Upload CSV” tab in the top-right and either drag and drop your CSV or upload from your computer. You can then download your results by clicking on “Get CSV”.
Our development team are keen to share how they built the tool, so watch out for a blog from the team coming soon!