WordPress robots.txt: Best-practice example for SEO

Jono Alderson 7 November 2019 Crawl directives, Technical SEO, Webmaster tools, WordPress

Your robots.txt file is a powerful tool when working on a website’s SEO – but you should handle it with care. It allows you to deny search engines access to different files and folders, but often that’s not the best way to optimize your site. Here, we’ll explain how we think site owners should use their robots.txt file and propose a ‘best practice’ approach suitable for most websites.

You’ll find a robots.txt example that works for most WordPress websites further down this page. If you want to know more about how your robots.txt file works, you can read our ultimate guide to robots.txt.

What does a “best practice” look like?

Search engines continually improve how they crawl the web and index content. That means what used to be best practice a few years ago might not work anymore or may even harm your site.

Today, best practice means relying on your robots.txt file as little as possible. It’s only really necessary to block URLs in your robots.txt file when you have complex technical challenges (e.g., a large eCommerce website with faceted navigation) or no other option.

Blocking URLs via robots.txt is a ‘brute force’ approach and can cause more problems than it solves.

For most WordPress sites, the following example is best practice:

User-Agent: *
Disallow:

Sitemap: https://www.example.com/sitemap_index.xml

We even use this approach in our robots.txt file — although, sometimes, you will notice that we are testing some stuff.

What does this code do?

The User-agent: * instruction states that any following instructions apply to all crawlers.
The Disallow: directive comes without further instructions, so we’re saying, “all crawlers can freely crawl this site without restrictions.”
In the robots.txt file, we also link to the location of the XML sitemap, making it easier for Google, Bing, and other search engines to find it.
We also provide some information for humans looking at the file (linking to this very page) so that they understand why we set up the file the way that we did.

If you have to disallow URLs

If you want to prevent search engines from crawling or indexing certain parts of your WordPress site, it’s almost always better to do so by adding meta robots tags or robots HTTP headers.

Our ultimate guide to meta robots tags explains how you can manage crawling and indexing ‘the right way,’ and our Yoast SEO plugin provides the tools to help you implement those tags on your pages.

If your site has crawling or indexing challenges that you can’t fix via meta robots tags or HTTP headers, or if you need to prevent crawler access for other reasons, you should read our ultimate guide to robots.txt.

Note that WordPress and Yoast SEO already automatically prevent indexing of some sensitive files and URLs, like your WordPress admin area (via an x-robots HTTP header).

Why is this a best practice for WordPress SEO?

Robots.txt creates dead ends

Search engines need to discover, crawl and index your pages before you can compete for visibility in the search results. If you’ve blocked specific URLs via robots.txt, search engines can no longer crawl through those pages to discover others. That might mean that key pages don’t get discovered.

Robots.txt denies links their value

One of the basic rules of SEO is that links from other pages can influence your performance. If a URL is blocked, not only won’t search engines crawl it, but they also might not distribute any ‘link value’ pointing to that URL or through that URL to other pages on the site.

Google fully renders your site

People used to block access to CSS and JavaScript files to keep search engines focused on those all-important content pages. Nowadays, Google fetches all your styling and JavaScript and renders your pages entirely. Understanding your page’s layout and presentation is a crucial part of how it evaluates quality. So Google doesn’t like it when you deny it access to your CSS or JavaScript files.

Previous best practice of blocking access to your wp-includes directory and your plugins directory via robots.txt is no longer valid, which is why we worked with WordPress to remove the default disallow rule for wp-includes in version 4.0.

Many WordPress themes also use asynchronous JavaScript requests – so-called AJAX – to add content to web pages. WordPress used to block Google from this by default, but we fixed this in WordPress 4.4.

Linking to your XML sitemap helps discovery

The robots.txt standard supports adding a link to your XML sitemap(s) to the file. This helps search engines discover the location and contents of your site. In the case of Bing, it needs this link to verify your site — unless you added a link to the sitemap via their Webmaster Tools.

It might feel redundant because you should already add your sitemap to your Google Search Console and Bing Webmaster Tools accounts to access analytics and performance data. However, having that link in the robots.txt gives crawlers a foolproof way of discovering your sitemap.

Yoast SEO automatically adds a link to your XML sitemap if you haven’t generated a robots.txt file yet. If you already have a robots.txt file, you can add the rule Sitemap: https://www.example.com/sitemap_index.xml to your file via the file editor in the Tools section of Yoast SEO. Keep in mind that you should add the full URL to your XML sitemap. Multiple sitemaps go on multiple lines and all need full URLs.

Assess your technical SEO fitness

Being mindful of your robots.txt file is an essential part of technical SEO. Curious how fit your site’s overall technical SEO is? We’ve created a technical SEO fitness quiz that helps you figure out what you need to work on!

Jono Alderson

Jono was the Head of SEO and part of the leadership team at Yoast. He's a digital strategist, marketing technologist, and full stack developer. He's into technical SEO, emerging technologies, and brand strategy.