The one-line robots.txt mistake that blocks your whole site

robots.txt is not authentication. It tells well-behaved crawlers what you prefer they skip. It also happens to be the fastest way to accidentally tell them to skip everything.

How crawlers read robots.txt

Well-behaved bots fetch `/robots.txt` before aggressive crawling. The file is plain text: `User-agent` groups, `Allow` / `Disallow` path rules, optional `Crawl-delay`, and `Sitemap` URLs. It lives at the site root on the same host — `https://example.com/robots.txt`, not tucked under `/blog/`.

Rules are prefix matches on paths, not regular expressions (unless you are on a stack that documents otherwise). `Disallow: /` means “do not fetch anything on this host.” That is correct for a staging mirror; it is catastrophic on production if you forgot to swap files during deploy.

Google may still index a URL that is disallowed if other sites link to it — robots blocks crawling, not necessarily listing. For sensitive content you need auth, `noindex`, or both — not robots alone.

Mistakes we keep seeing

Copy-pasting a “block everything” template from an old project is the classic. So is leaving `Disallow: /api/` on a site where your entire app routes through `/api/` because of a framework quirk. Another favorite: two conflicting `User-agent: *` blocks where the later one wins in ways you did not intend.

Trailing spaces and wrong line endings rarely matter, but typos in `User-agent` names do — `User-agent: Googlebot` only applies to that bot. A blanket `*` group is what most people want for global defaults, with specific overrides above or below depending on your generator’s ordering.

Forgetting the `Sitemap:` line does not block indexing, but it slows discovery of new URLs. After a redesign, we always regenerate sitemap and robots together so Search Console stops guessing.

Build robots.txt deliberately

Start from intent: allow all public marketing pages, disallow admin paths, staging hosts, and raw export endpoints. Write that down before touching syntax.

The Robots.txt Generator on DroidXP outputs standards-style groups with presets (allow all, disallow all, common private paths) plus custom lines. Everything runs locally — paste into your deploy artifact, diff in git, ship.

After deploy, verify with Search Console’s robots tester and a real `curl https://yoursite.com/robots.txt`. We have caught CDN caches serving an old deny-all file days after the repo was fixed.

robots.txt in a larger SEO habit

Pair robots with an XML sitemap and sensible canonical tags. Robots tells crawlers where not to spend budget; sitemaps highlight what you want discovered. They solve different problems.

When you migrate domains, update robots and sitemap on both hosts during the redirect window. Old host deny-all plus forgotten 301s is a recipe for a quiet quarter.

Treat robots.txt like firewall config: small file, high impact, deserves a checklist on every launch.