How to fix: Google Blocked by Robots.txt

How to fix: Google Blocked by Robots.txt

If you get the following errors:

  • Sitemap.xml: Couldn’t fetch
  • Sitemap could not be read
  • Indexing request rejected: During live testing, indexing issues were detected with the URL.
  • URL is not available to Google: it cannot be indexed. Availability Blocked by robots.txt.
  • ……

And you receive a notification from Google Search Console:

To the owner of Your domain:

Search Console has identified that your site is affected by xxx Coverage issues:

Top Errors

Errors can prevent your page or feature from appearing in Search results. The following errors were found on your site:

Submitted URL blocked by robots.txt

Top Warnings

Warnings are suggestions for improvement. Some warnings can affect your appearance on Search; some might be reclassified as errors in the future. The following warnings were found on your site:

Indexed, though blocked by robots.txt

We recommend that you fix these issues when possible to enable the best experience and coverage in Google Search.

Fix Coverage issues

The cause of the error:


These are “examples” because the table might not include all instances on your site. Instances can be omitted for various reasons, including:
The table row limit (1,000 items) has been exceeded
An instance occurred after the last crawl

You need how to fix: Google Blocked by Robots.txt

Google Search Console Reports: Crawl Blocked by Robots.txt

If you notice your web traffic decrease substantially in a short period of time, you should see what Google thinks:

  1. surf to Google Search Console (formerly names Google Webmaster Tools)
  2. click URL INSPECTION
  3. in the INSPECT ANY URL bar type in your root domain

If you see the following, you have a serious problem:

Indexed, though blocked by robots.txt
Crawled as Googlebot desktop
Crawl allowed? No: blocked by robots.txt
Page fetch Failed: Blocked by robots.txt

Google Search Console URL Inspection crawl allowed no blocked by robots txt

HOW TO CHECK YOUR ROBOTS.TXT FILE?

A robots.txt file is a plain text file in the root or your site that tells robots (i.e. Google’s search bot) what it should be looking at and what it should not be looking at. In fact most sites do not need a robots file anymore because:

Link check: https://www.google.com/webmasters/tools/robots-testing-tool?hl=en&siteUrl=https://vinadomain.vn/

  1. Robots.txt is only a SUGGESTION to bots.
    • Malicious bots will ignore it
  2. Google, Yahoo, Microsoft and other bots already know what to index and what to avoid on most websites
    • For instance, GoogleBot is smart enough to ignore WordPress readme files and the WP-ADMIN filder by default withouth Robots.txt telling it to skip them

If you want to see your Robots.txt file from your browser right now, just surf to <your domain>/robots.txt  For instance, if you want to see URTech’s robots file, just surf to https://vinadomain.vn/robots.txt .  As you can see it is wide open for everyone and every bot to read… or ignore.

If you see something like:

Disallow: /wp-admin/

your Robots.txt file is telling GoogleBot to index the entire site EXCEPT the items in the WP-ADMIN folder.

If you see something like:

Disallow:

Which does not specify what to disallow, bots will not index your entire site and your traffic will likely grind to a halt.  You need to correct this immediately.

HOW TO EDIT A ROBOTS.TXT FILE:

Robots.txt is just a plain text file and can be easily edited through your web hosting companies file manager or using an FTP product like FileZilla to view (and edit!) the files on the server that is hosting your site.

If you are not sure how to do this, just call your web hosting company and they will walk you through the easy steps in just a few minutes.

I MODIFIED MY ROBOTS.TXT FILE BUT IT IS NOT SHOWING THE CHANGES

If you delete (a good first step) or you modify your robots.txt file but find that when you surf to <your domain>/robots.txt it has not updated, you have a problem.

It is possible that the file is just cached so you should clear your browser cache, clear your websites cache (i.e. you may be using a performance accelerator like WPSuperCache) and possibly even your Content Delivery Networks cache (i.e. we use CloudFlare to replicate our site globally and provide additional security, but most sites don’t use any).

If caching is not your problem, you HTaccess file is likely redirecting requests for Robots.txt to a different location and that means you are hacked.

MOST LIKELY YOU HAVE BEEN HACKED; NOW WHAT?

WHAT IS AN HTACCESS FILE?

If your website is like most, it will be hosted on a Linux server running Apache Web Server. The .HTACCESS file contains your sites core configuration.  We can explain more, but if you really care THIS article from WordPress explains htaccess very well.

When someone goes to your site, before it does anything else, Apache will read your .HTaccess file and that is likely hacked.

HOW TO VIEW AND EDIT MY HTACCESS FILE

Take a look at your HTaccess can be easily edited through your web hosting companies file manager or using an FTP product like FileZilla to view (and edit!) the files on the server that is hosting your site.

If you are not sure how to do this, just call your web hosting company and they will walk you through the easy steps in just a few minutes.

HOW TO TELL IF YOUR .HTACCESS FILE IS HACKED

If you open your .HTACCESS and find “Rewrite” instructions like the following, you are likely hacked:

RewriteEngine On

RewriteBase /
RewriteCond %{HTTP_USER_AGENT} (google|yahoo|msn|aol|bing) [OR]
RewriteCond %{HTTP_REFERER} (google|yahoo|msn|aol|bing)
RewriteCond %{HTTP_HOST} vinadomain\.vn$
RewriteRule . check-caveat.php [L,S=10000]

This is almost plain English.  You can see that if Google, Yahoo, MSN or Bing are sending traffic to your site, it is being redirected and that is bad.

GoDaddy File Manager to Restore Files

The simple thing to do is just delete those instructions from your .HTACCESS but that file contains alot of cryptic commands that most people will not want to risk playing with so the easier thing to do is just to replace that file with a backed up version.  If you don’t have a backed up version of that file, your webhost probably does.

In our case, we have plenty of back ups but we are hosted with GoDaddy so we used just used their File Manager to restore an .HTACCESS file from a few days before we thought we were hacked.

We then resubmitted to Google via the GOOGLE SEARCH CONSOLE and bingo, our traffic returned and we were happy.

Bonus: You can check after fix by tool: https://search.google.com/test/rich-results

Analyzing Fetching page and running test.
If notice green tick then all okay.
Page is eligible for rich results, All structured data on the page can generate rich results.
So, you receive a notification from Google:

To owner of Your Domain,

Google has started validating your fix of Coverage issues on your site. Specifically, we are checking for ‘Submitted URL blocked by robots.txt’, which currently affects xxx pages.

Validation can take a few days; we will send you a message when the process is complete. You can monitor the progress of the test by following the link below.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *

14 − 4 =