Yoast SEO settings: Crawl optimization
Crawlability is essential in SEO. If you want search engines to find your site and show it in the search results, your site must be crawlable. Not only that, but you must ensure that search engines get a chance to crawl the pages that you want to rank with. There is no easy way to do that. But, with Yoast SEO, you can clear all the URLs that don’t have any SEO value out of the search engine’s way.
There is also another, less talked about, side to this story. Crawling requires a lot of resources. Search engines and other parties like various apps, for instance, need a lot of electricity to crawl the growing number of sites and their URLs. Website owners also need powerful servers to make it possible for both visitors to visit and robots to crawl their sites. So, by making crawling more efficient, you contribute not only to your site’s SEO but also to consuming less electricity!
By using the crawl optimization settings in Yoast SEO, you can easily clear out URLs that don’t have any SEO value. This makes crawling more efficient and reduces your site’s carbon footprint. In this article, we’ll explain all of the crawl optimization settings one by one.
Table of contents
- Video: How to use the crawl optimization settings in Yoast SEO
- Where to find the crawl optimization settings
- Remove unwanted metadata
- Disable unwanted content formats
- Remove unused resources
- Internal site search cleanup
- Advanced: URL cleanup
- Will using the crawl settings affect my site’s rankings?
- Read more
Video: How to use the crawl optimization settings in Yoast SEO
Where to find the crawl optimization settings
You can find the crawl optimization settings by following these steps:
- Log into your WordPress site.
You will be in your WordPress dashboard.
- Click “Yoast SEO”.
In the menu on the left-hand side, find the “Yoast SEO” menu item.
- Click “Settings”.
In the menu that unfolds when clicking “Yoast SEO”, click “Settings”.
- Navigate to the “Advanced” heading and click “Crawl optimization”.
On the Yoast SEO settings page, navigate to the “Advanced” heading and click “Crawl optimization” to open the crawl optimization settings.
- That’s it!
You’ll be on the crawl optimization settings page in Yoast SEO.
The crawl optimization settings are divided into five sections. We’ll explain them one by one.
Remove unwanted metadata
The first section is called “Remove unwanted metadata”. What can you do here? Unlike humans, who read what’s on the front end of your site, robots read what they find in the source code. If you open the source code of your site (see the image below), you will notice many URLs there. When crawlers come to crawl your site, they’ll visit each one of the URLs they find. And they will do that tens or hundreds of times per day.
So, why is that a problem? Well, WordPress adds a lot of URLs and tags to your website’s header and <head>
section. А lot of those additions are unnecessary, and they don’t have any SEO value. So, we’ve created multiple toggles that allow you to disable a specific piece of output. Below, you can read more about what each of the toggles does.
Remove shortlinks
In the <head>
section of a single post, WordPress creates a shortlink
output (see example below).
<link rel='shortlink' href='http://testsite.com/?p=1' />
The shortlink is basically a shortened version of the URL of the same page. With this toggle, you can remove that output.
Remove REST API links
The WordPress REST API is a developer-oriented feature that lets applications interact with your WordPress site. Automatically, WordPress adds a REST API link to the <head>
section of your site for discoverability.
<link rel="https://api.w.org/" href="http://
testsite.com
/wp-json/" />
However, most sites don’t use the WordPress REST API. If your site is one of those, you can safely remove the link with this feature.
Remove RSD/WLW links
The RSD (Really Simple Discovery) link in the <head>
section of your site is for when you use these types of services. If you do not, it is safe to remove the link with this toggle.
<link rel="EditURI" type="application/rsd+xml" title="RSD" href="http://testsite.com/xmlrpc.php?rsd" />
The WLW link is intended for users of the discontinued Windows Live Writer. If you do not use it, you can safely remove this link as well.
<link rel="wlwmanifest" type="application/wlwmanifest+xml" href="http://testsite.com/wp-includes/wlwmanifest.xml" />
Remove oEmbed links
With this toggle, you can remove the oEmbed links from the <head>
section of all your single posts.
<link rel="alternate" type="application/json+oembed" href="http://testsite.com/wp-json/oembed/1.0/embed?url=http%3A%2F%2Ftestsite.com%2F2022%2F05%2Fhello-world%2F" /><link rel="alternate" type="text/xml+oembed" href="http://testsite.com/wp-json/oembed/1.0/embed?url=http%3A%2F%2Ftestsite.com%2F2022%2F05%2Fhello-world%2F&format=xml" />
These links help other sites consume your content. You won’t harm any of your content by removing them.
Remove generator tag
The generator tag displays the WordPress version your site is using.
<meta name="generator" content="WordPress 6.0" />
This tag has no SEO value, and, in fact, it can potentially be a security threat. So, you can easily remove it with this toggle.
Pingback HTTP header
Pingbacks are used to notify you when someone has added a link to your site. However, this standard is very old, and you are most likely not using it anymore. If you switch the toggle to remove, it will remove the X-Pingback: http://testsite.com/xmlrpc.php
from the response header.
Powered by HTTP header
With this toggle, you remove the information about the PHP version your site is using from the response header. This information is not required for your site to function properly, so you can safely remove it.
Disable unwanted content formats
The next section is called “Disable unwanted content formats”. Your site probably has more URLs than you realize. For instance, WordPress creates feeds for a lot of content on your site, which can be a problem for crawlers. A crawler will start crawling the URLs, and, at some point, it might run out of crawl budget. As a result, there won’t be any budget left for your important posts and pages. That’s why it’s wise to remove those URLs and let search engines crawl your site more efficiently.
In the crawl optimization settings in Yoast SEO, you can toggle multiple switches that let you keep or remove the various feeds. We don’t automatically remove them for each site because we can’t predict the needs of all Yoast SEO users. But, if you don’t get any value from them, we recommend you switch the toggles on. Below, you can see exactly which feeds you can remove with the crawl optimization settings.
Remove global feed
The “Remove global feed” toggle lets you remove the global feed, which is an overview of your recent posts.
- Type of page: any page
- Example feed: https://www.example.com/feed/
Remove global comments feed
The “Remove global comments feed” toggle lets you remove the global comments feed, an overview of recent comments on your site.
- Type of page: any page
- Example feed: https://www.example.com/comments/feed/
Note: Disabling this feed will also disable the post comments feeds.
Remove post comments feed
The “Remove post comments feed” toggle is for removing the feed for recent comments on each post. If you enable or disable the “remove global comments” toggle, the “remove post comments feed” will automatically be enabled or disabled too.
- Example feed: https://www.example.com/example-post/feed/
Remove post author feeds
The “Remove post authors feed” toggle is for removing the feeds for recent posts by specific authors.
- Type of page: author archive, e.g., https://www.example.com/author/admin/
- Example feed: https://www.example.com/author/admin/feed/
Remove post type feeds
The “Remove post type feeds” toggle lets you remove post type feeds, which provide information about your recent posts, for each post type.
- Type of page: post type archive, e.g., https://www.example.com/my-books/
- Example feed: https://wwww.example.com/my-books/feed/
Remove category feeds
The “Remove category feeds” toggle lets you remove category feeds, which provide information about your recent posts, for each category.
- Type of page: category archive, e.g., https://www.example.com/fiction/
- Example feed: https://www.example.com/category/fiction/feed/
Remove tag feeds
The “Remove tag feeds” toggle lets you remove tag feeds, which provide information about your recent posts, for each tag.
- Type of page: tag archive, e.g., https://www.example.com/tag/fantasy/
- Example feed: https://www.example.com/tag/fantasy/feed/
Remove custom taxonomy feeds
The “Remove custom taxonomy feeds” toggle lets you remove custom taxonomy feeds, which provide information about your recent posts, for each custom taxonomy.
- Type of page: custom taxonomy archive, e.g., https://www.example.com/book-genre/crime/
- Example feed: https://www.example.com/book-genre/crime/feed/
Search results feeds
The “Remove search results feeds” toggle lets you remove search results feeds, which provide information about your search reults
- Type of page: search results, e.g., https://www.example.com/?s=world
- Example feed: https://basic.wordpress.test/search/world/feed/rss2/
Atom/RDF feeds
The final toggle allows you to remove Atom/RDF feeds, which are specific formats for feeds.
- Type of page: any page
- Example feed: any feed listed above, adding
/atom
or/rdf
in the end, e.g.:- https://www.example.com/feed/atom
- https://www.example.com/feed/rdf
- https://www.example.com/comments/feed/atom
- https://www.example.com/comments/feed/rdf
- https://www.example.com/hello-world/feed/atom
- https://www.example.com/hello-world/feed/rdf
Remove unused resources
In the “Remove unused resources” section, you can remove the resources that WordPress usually loads but that your site doesn’t always need.
Remove emoji scripts
If you don’t use emojis in your content, you can safely remove the JavaScript used for converting emoji characters in older browsers. You can do that by switching the toggle behind “Remove emoji scripts”.
Remove WP-JSON API
The “Remove WP-JSON API” toggle allows you to prevent robots from crawling the WordPress JSON API endpoints. Unless you’re using the WordPress REST API to output important content, you can switch this toggle off to improve your crawl efficiency.
This adds a “disallow” rule to your robots.txt file to prevent the crawling of the WordPress JSON API endpoints., e.g. https://www.example.com/wp-json/ and https://www.example.com/?rest_route=/.
Internal site search cleanup
The next section is called “Internal site search cleanup”. Spammers sometimes target internal site search URLs on your site for their own purposes. Those URLs might get crawled by search engines and might be seen by users. That can harm your SEO (and your branding)! This feature identifies some common spam patterns and stops them in their tracks.
Filter search terms
First of all, you can choose to filter search terms by switching the toggle behind “Filter search terms”. If you enable this option, then you get more specific options, which we will discuss below.
Max number of characters to allow in searches
If you choose to filter search terms, then you can set a maximum number of characters to allow in searches. This reduces the impact of spam attacks and confusing URLs.
Filter searches with emojis and other special characters
You can also decide whether you want to block searches with emojis and other special characters, as these searches may be part of a spam attack.
Filter searches with common spam patterns
Finally, you can choose to filter searches with common spam patterns. The common spam patterns our plugin cleans up are: TALK: QQ: [:()【】[]]
.
Redirect pretty URLs to ‘raw’ format
Next, you can choose to redirect pretty URLs for search pages to the raw format. WordPress supports two endpoint formats for site search queries:
- A raw format:
example.com/?s=example
- A pretty format:
example.com/search/example
The pretty format will only be supported when pretty permalinks are enabled. When both formats exist, this can lead to problems, because this doubles the number of URLs that search engines can crawl. In addition, it can increase the number of ways in which your site can be attacked by spammers.
Therefore, we provide an option to turn off one of these formats in Yoast SEO. When you switch the toggle behind “Redirect pretty URLs for search pages to raw format” to on, Yoast SEO disables the pretty format. The plugin then redirects requests from the pretty format to the raw format, while maintaining any query parameters and/or pagination. We disable the pretty format because the raw format is relatively universal and language- and territory-agnostic and more (natively) interoperable with most analytics and tracking systems.
Prevent crawling of internal site search URLs
The final option in this section is to prevent crawling of internal site search URLs. This adds a disallow rule to your robots.txt file so your internal site search URLs won’t be crawled.
In general, blocking your internal search pages via your robots.txt would not be our advice. It’s better to allow search engines to crawl these pages, but to prevent them from indexing them by using a noindex tag, which Yoast SEO automatically does for your site. However, if your search results pages are being crawled excessively and there’s evidence that that’s harmful, for example, for your crawl budget, or if your search results pages are under attack, you should enable this option.
Advanced: URL cleanup
These are advanced settings that you should only use if you know what you are doing! To learn more read the Advanced crawl settings article.
Will using the crawl settings affect my site’s rankings?
We understand that this all might sound a bit scary. But don’t worry, using the crawl settings in Yoast SEO will not harm your website’s crawlability or rankings. The crawl settings are there to help you clean up unnecessary URLs and can help search engines crawl your site more efficiently.
In addition to this, it’s important to note that the crawl settings in Yoast SEO do not have any effect on the crawl rate of a website. This means that the speed at which search engines, like Google, crawl and index a website is not impacted by the crawl settings in Yoast SEO.
Read more
Want to know more about crawling and how it affects the environment? Check out these links: