Robots.txt controls crawler access, not whether a sensitive page should appear in search

Robots.txt controls crawler access, not whether a sensitive page should appear in search. Treating robots.txt as a privacy or deindexing tool creates a false sense of protection. A blocked crawler may be unable to fetch the page, but that is not the same as making the page private or reliably removing it from search.

Google Search Central explains that robots.txt tells crawlers which URLs they can access and says it is mainly used to avoid overloading a site with requests. The same guidance warns that robots.txt is not a mechanism for keeping a web page out of Google, and points to noindex or password protection for pages that should not appear.

A source trail should therefore record the intent of each rule. Is the rule reducing crawl load for faceted URLs? Blocking low-value generated pages? Keeping staging paths away from crawlers? Or trying to hide something that should instead be noindexed, authenticated, removed, or never published? These are different decisions even if they all fit into a robots.txt file.

A practical robots note includes path pattern, crawler group, reason, expected indexability state, owner, review date, and safer alternative when the intent is privacy. The test should be plain: if the page must not be seen, robots.txt is the wrong primary control.

Robots.txt controls crawler access, not whether a sensitive page should appear in search

// COMMENTS

ON THIS PAGE