Robots.txt checks

How DomainCare monitors robots.txt for missing files, invalid directives, and content changes on your domain.

Last updated 7/12/20263 min read

Robots.txt checks

DomainCare fetches robots.txt on every check cycle, verifies the file is reachable and structurally valid, and detects when its content changes. A missing or invalid robots.txt can cause search engines to misinterpret crawl permissions for your domain.

What it monitors

Existence — whether https://<domain>/robots.txt returns a 2xx response
HTTP status — any non-200, non-404 response is flagged as an error
User-agent directive presence — a file with no User-agent: line is considered invalid
Content changes — the full file body is compared against the previous successful fetch; any difference triggers an info alert
File size — recorded in bytes for each run

How often it runs

The robots.txt check runs every 12 hours (43,200 seconds) by default. Paid plans can override this per domain via per-check controls. Redirects are followed automatically (up to the platform redirect limit) so that http:// to https:// redirects do not generate false-positive failures.

A 404 response is treated as a clean "file not found" failure rather than a network error, because some frameworks intentionally omit robots.txt and that is a distinct state worth alerting on.

Alerts this check produces

Event	Tone	When it fires
`robots_missing`	Failure	The file returned 404, a non-2xx status, or has no `User-agent:` directive
`robots_content_changed`	Info	The file body differs from the last recorded version
`robots_recovered`	Recovery	The file is reachable and valid again after a `robots_missing` event

What to do when alerts fire

robots_missing — 404 response. Create a robots.txt file at your web root. A minimal valid file that allows all crawlers is:
```
User-agent: *
Allow: /
```
robots_missing — no User-agent directive. Your file exists but contains only comments or Disallow lines without a corresponding User-agent header. Add at least one User-agent: line before any Allow or Disallow directives.
robots_missing — non-2xx HTTP status. Check your web server configuration for errors at the /robots.txt path. A 500 or 403 on this path will prevent crawlers from reading your rules.
robots_content_changed. Review the diff between the previous and current content. Changes may be intentional (a deployment added new Disallow rules) or accidental (a misconfigured deploy wiped the file or replaced it with a default template).
Wait for robots_recovered. After fixing the file, DomainCare will emit robots_recovered on the next successful check run.

Catch robots.txt regressions before Google does

DomainCare alerts you the moment your robots.txt goes missing or changes unexpectedly.

Start free trial

Robots.txt checks

How DomainCare monitors robots.txt for missing files, invalid directives, and content changes on your domain.

Last updated 7/12/20263 min read

Robots.txt checks

What it monitors

Existence — whether https://<domain>/robots.txt returns a 2xx response
HTTP status — any non-200, non-404 response is flagged as an error
User-agent directive presence — a file with no User-agent: line is considered invalid
Content changes — the full file body is compared against the previous successful fetch; any difference triggers an info alert
File size — recorded in bytes for each run

How often it runs

A 404 response is treated as a clean "file not found" failure rather than a network error, because some frameworks intentionally omit robots.txt and that is a distinct state worth alerting on.

Alerts this check produces

Event	Tone	When it fires
`robots_missing`	Failure	The file returned 404, a non-2xx status, or has no `User-agent:` directive
`robots_content_changed`	Info	The file body differs from the last recorded version
`robots_recovered`	Recovery	The file is reachable and valid again after a `robots_missing` event

What to do when alerts fire

robots_missing — 404 response. Create a robots.txt file at your web root. A minimal valid file that allows all crawlers is:
```
User-agent: *
Allow: /
```
robots_missing — no User-agent directive. Your file exists but contains only comments or Disallow lines without a corresponding User-agent header. Add at least one User-agent: line before any Allow or Disallow directives.
robots_missing — non-2xx HTTP status. Check your web server configuration for errors at the /robots.txt path. A 500 or 403 on this path will prevent crawlers from reading your rules.
robots_content_changed. Review the diff between the previous and current content. Changes may be intentional (a deployment added new Disallow rules) or accidental (a misconfigured deploy wiped the file or replaced it with a default template).
Wait for robots_recovered. After fixing the file, DomainCare will emit robots_recovered on the next successful check run.

Catch robots.txt regressions before Google does

DomainCare alerts you the moment your robots.txt goes missing or changes unexpectedly.

Start free trial

Robots.txt checks

Robots.txt checks

What it monitors

How often it runs

Alerts this check produces

What to do when alerts fire

Related

Catch robots.txt regressions before Google does

Robots.txt checks

Robots.txt checks

What it monitors

How often it runs

Alerts this check produces

What to do when alerts fire

Related

Catch robots.txt regressions before Google does