When we talk about website “visitors” on our blog, we’re almost always talking about people reaching your website through a browser. Nothing surprising there.

But, there’s another important visitor you should be thinking about when running a WordPress website: bots. According to a 2020 report, it’s estimated that bot traffic accounts for around 40% of all website traffic, comprising 25% “bad bots” and 15% “good bots.”

infographic showing the proportion of human traffic on the web (59.2%) to good bot traffic (15.2%) and bad bot traffic (25.6%)

Image Source

To deal with the bad bots, see our guide to WordPress security. In this post, we’ll focus on the good bots — namely those used by search engines to crawl your pages and index your content so that your website appears in search results.

There are many ways to maintain and optimize your WordPress site for good bots, one of which is understanding how your robots.txt file works. This file can instruct bots to go to some parts of your website while ignoring the parts you want hidden from search. This way, only your relevant content is crawled and shown in organic search results.

Grow Your Business With HubSpot's Tools for WordPress Websites

In this guide, we’ll introduce you to the robots.txt file in WordPress. You’ll learn what it’s meant to do, where to find it, what it contains, and how to tailor its contents to your SEO needs.

In order to display your web pages in search results, search engines need to understand the structure and content of your website — what information your web pages contain, and how these pages are connected. They do this by deploying search bots, also called “crawlers,” to index your web pages. With this indexed information, a search engine determines your site’s rank for a given query.

However, you may not want crawlers to visit certain parts of your WordPress website, such as pages under maintenance, a staging area, your plugin, theme, and admin folders, as well as other pages you don’t want users finding through search engines.

Here’s where robots.txt matters: Before crawling your site, a search engine bot will first look for a robots.txt file to tell it what pages and/or areas of your site to ignore. If you’ve included one, the bot will crawl your site based on the instructions provided in this file. Otherwise, it will crawl your site as usual.

What’s in robots.txt?

WordPress automatically creates a simple robots.txt file for new websites and puts it in the root directory. You can view the content of your robots.txt by adding “/robots.txt” to the end of your website’s main URL. For example, if your homepage URL is “https://example.com”, the URL “https://example.com/robots.txt” will display your robots.txt in the browser window. It probably looks something like this:

 
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://www.example.com/wp-sitemap.xml

Let’s unpack the contents of this file. A robots.txt file consists of one or more code blocks, each of which contains instructions for one or more bots. A code block consists of:

  • a user-agent, which identifies the bot targeted by the rule. In the above example, the user-agent is an asterisk (*), meaning the block applies to any user-agent that visits the website. However, you can target a specific bot with its user-agent. For example, Google’s user-agent is Googlebot. Replacing the asterisk with Googlebot would make the block apply only to Google.
  • One or more of the following commands followed by the associated file or directory:
    • Disallow instructs search engine bots to ignore the specified file or files. In the example above, robots.txt prevents bots from crawling contents of the wp-admin directory.
    • Allow lets search engine bots access the specified file or files. This isn’t necessary unless you want to allow access to a file or subdirectory within a disallowed parent directory. In the example above, bots may crawl the file admin-ajax.php, even though it is inside the wp-admin directory.
    • Crawl-delay specifies how much time a search engine bot should wait between page loads. It is paired with a number value in seconds.

You can also place a forward slash (/) next to Disallow instead of a file or directory name to disallow all pages on your website.

After these blocks, a robots.txt file may also include one or more Sitemaps, each paired with a URL of a sitemap for the website. This isn’t required, and unnecessary if you regularly submit sitemaps to search engines through services like Google Search Console.

To sum up, here’s a template for what a robots.txt file may look like:

 
User-agent: [user-agent]
Disallow: [URL you don’t want the user-agent to crawl]
User-agent: [user-agent name]
Disallow: [URL you don’t want the user-agent to crawl]
Allow: [URL you want the user-agent to crawl]

Sitemap: [URL of sitemap]

Why use robots.txt in WordPress?

If you run a smaller WordPress website with minimal content that can be indexed, you likely won’t need to worry about robots.txt. On larger websites, a modified robots.txt file can potentially help your rankings and page speed.

For every website they index, search engines set a crawl budget, which is the number of pages on a website that the search engine crawls over a period of time. If your number of indexed pages exceeds your crawl budget, your whole website won’t be indexed until the next crawling session. This has negative consequences for your organic search rankings if bots waste their crawl budgets on irrelevant or private sections of your site and miss your most important pages.

Your robots.txt file lets you tell search engines which areas to skip over, therefore prioritizing your primary content. With some minor edits to this file, you help point search bots to the content you want indexed every time.

robots.txt can also improve your site’s performance and, as a result, its user experience. Search bots are like human visitors in that they request pages from your server and use up resources. By instructing search engine crawlers to avoid large sections of your site, you’ll free up server resources for human visitors. You might even choose to turn certain bots away from your site altogether. Even if it’s a marginal difference in speed, you can expect more engagement on your site as a result.

Note that bots aren’t required to follow all or any rules in your robots.txt file. It’s up to the bot to decide whether to obey your instructions, though crawlers for popular search engines will generally adhere to most commands you place in robots.txt. But, “bad” bots, as you might assume, will ignore your robots.txt file. For this reason, you should never rely on robots.txt as a security measure to guard private information.

Additionally, just because you’ve disallowed a page or group of pages, that doesn’t eliminate the possibility of a search engine indexing these pages. If a crawler finds a backlink on another website that leads to your disallowed page, it may index that page anyway. To prevent indexing of a page, it’s a better practice to use the noindex meta tag, password-protect the page, or remove the page from your server files.

How to Create and Edit a robots.txt File in WordPress

To edit your WordPress robots.txt file, you have the option of using a plugin or editing your file manually and uploading it to your server via FTP. The former method is better suited for beginners, but the latter doesn’t require you to install an extra plugin to make changes.

Create and Edit robots.txt With a Plugin

Many WordPress SEO plugins offer a way to modify your robots.txt without editing the file directly. All in One SEO (AIOSEO) is one such plugin. It’s also well-received, with over three million downloads. The free version of AIOSEO lets you easily add rules to robots.txt.

To edit your robots.txt file in AIOSEO:

1. Install and activate the All in One SEO plugin.

2. You’ll be brought to the AIOSEO setup wizard. You can continue setup with the wizard, or return to your dashboard.

3. In your dashboard, go to All in One SEO > Tools.

4. Under the Robots.txt Editor tab, turn on Enable Custom Robots.txt.

the toggle button to turn on robots.txt in wordpress

5. Further down under Robots.txt Preview, you’ll see your current robots.txt file. It will contain the rules added by WordPress. By default, this keeps search bots out of your WordPress core, with the exception of your ajax-admin.php file.

a preview of robots.txt in wordpress

6. To add a rule, add your User Agent and Directory Path in the appropriate fields, and select either Allow or Disallow as needed.

You’ll see the robots.txt preview change as you add this information.

a preview of a modified robots.txt file in wordpress

7. To add another rule, click Add Rule and fill in the fields as you did in the previous step.

8. When finished, click Save Changes in the bottom right corner.

Create and Edit robots.txt Manually

If you don’t want to use an SEO plugin to change your file, you can also modify robots.txt directly using a File Transfer Protocol (FTP) client. Here’s how:

1. In your text editor of choice, create a file called robots.txt.

2. Add your desired rules to this file in the proper format — provided below — and save the file:

 
User-agent: [user-agent]
Disallow: [URL you don’t want the user-agent to crawl]
User-agent: [user-agent name]
Disallow: [URL you don’t want the user-agent to crawl]
Allow: [URL you want the user-agent to crawl]

Sitemap: [URL of sitemap]

3. Using an FTP client, connect to your WordPress hosting.

4. Upload robots.txt to your server and place it in your site’s root directory. If there is already a robots.txt file in this directory, your new file should replace it.

5. To modify this file in the future, download it from your server, make modifications in your text editor, then re-upload the file to your server. Or, you can make edits directly through your FTP client.

Test Your New robots.txt File

After completing your edits, either manually or with a plugin, you should test your new robots.txt file to ensure your instructions work properly. A typo in this file could direct bots away from your important content and harm your SEO.

To evaluate your robots.txt file, you can use the testing tool inside Google Search Console. Once you’ve connected your site to Google Search Console, open the robots.txt Tester and choose your domain from the dropdown. Google scans your file and alerts you of any errors or warnings. If the tool shows zero errors and zero warning, you’re all set:

the robots.txt review tool in google search console

WordPress robots.txt Example Rules

Here are some simple examples of what a robots.txt block can look like, which you can add to your own file according to your needs.

Allow a File in a Disallowed Folder

You may want to prevent bots from crawling all files in a directory except for one file. In that case, implement the following rule:

 
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Replace the directory and file names above with your own.

Ban One or All Bots

To keep a bot from crawling your pages, use the following rule:

 
User-agent: [user agent]
Disallow: /

Or, prevent all bots (that follow your rules) from crawling your pages with the following:

 
User-agent: *
Disallow: /

Ban All Bots Except for One

These rules restrict access to your site for all bots except for one:

 
User-agent: [user agent]
Disallow:
User-agent: *
Disallow: /

Add Crawl Delay

Another way to reduce search bot traffic on your site is by adding a crawl delay rule to robots.txt. Crawl-delay sets an amount of time (in seconds) a bot is required to wait before crawling the next page. In the example below, I’ve added a crawl delay of one minute (60 seconds):

 
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Crawl-delay: 60

Note that bots may not obey your crawl delay. If your performance continues to be affected by these bots, consult the service’s documentation or contact support (for good bots), or implement a firewall (for bad bots).

Point bots in the right direction.

As your website scales and you increase your number of pages, robots.txt becomes a valuable asset to your search engine optimization. It can steer search engine bots toward your most valuable content, resulting in higher organic traffic and potentially faster load times.

Plus, robots.txt is easy to understand, so any WordPress admin can make a couple of basic tweaks, sit back, and enjoy the results (the search results, that is).

Use HubSpot tools on your WordPress website and connect the two platforms  without dealing with code. Click here to learn more.

 Use HubSpot tools on your WordPress website and connect the two platforms  without dealing with code. Click here to learn more.

Originally published Nov 23, 2021 7:00:00 AM, updated November 23 2021

Topics:

WordPress Website