Robots.txt checks
How DomainCare monitors robots.txt for missing files, invalid directives, and content changes on your domain.
Robots.txt checks
DomainCare fetches robots.txt on every check cycle, verifies the file is reachable and structurally valid, and detects when its content changes. A missing or invalid robots.txt can cause search engines to misinterpret crawl permissions for your domain.
What it monitors
- Existence — whether
https://<domain>/robots.txtreturns a 2xx response - HTTP status — any non-200, non-404 response is flagged as an error
User-agentdirective presence — a file with noUser-agent:line is considered invalid- Content changes — the full file body is compared against the previous successful fetch; any difference triggers an info alert
- File size — recorded in bytes for each run
How often it runs
The robots.txt check runs every 12 hours (43,200 seconds) by default. Pro and Business plans can override this per domain via per-check controls. Redirects are followed automatically (up to the platform redirect limit) so that http:// to https:// redirects do not generate false-positive failures.
A 404 response is treated as a clean "file not found" failure rather than a network error, because some frameworks intentionally omit robots.txt and that is a distinct state worth alerting on.
Alerts this check produces
| Event | Tone | When it fires |
|---|---|---|
robots_missing | Failure | The file returned 404, a non-2xx status, or has no User-agent: directive |
robots_content_changed | Info | The file body differs from the last recorded version |
robots_recovered | Recovery | The file is reachable and valid again after a robots_missing event |
What to do when alerts fire
-
robots_missing— 404 response. Create arobots.txtfile at your web root. A minimal valid file that allows all crawlers is:User-agent: * Allow: / -
robots_missing— noUser-agentdirective. Your file exists but contains only comments orDisallowlines without a correspondingUser-agentheader. Add at least oneUser-agent:line before anyAlloworDisallowdirectives. -
robots_missing— non-2xx HTTP status. Check your web server configuration for errors at the/robots.txtpath. A500or403on this path will prevent crawlers from reading your rules. -
robots_content_changed. Review the diff between the previous and current content. Changes may be intentional (a deployment added newDisallowrules) or accidental (a misconfigured deploy wiped the file or replaced it with a default template). -
Wait for
robots_recovered. After fixing the file, DomainCare will emitrobots_recoveredon the next successful check run.
Related
Catch robots.txt regressions before Google does
DomainCare alerts you the moment your robots.txt goes missing or changes unexpectedly.
Start free trial