Robots Directives Reference

Every robots.txt and meta-robots directive with scope and search-engine support.

Searchable reference for robots.txt rules and HTML meta-robots / X-Robots-Tag directives, with per-crawler support notes for Google, Bing, and other engines. Understand crawl vs. index control.

What is the difference between robots.txt and meta robots?

robots.txt controls crawling — whether a bot may request a URL at all. The meta-robots tag and X-Robots-Tag header control indexing and link following after the page is fetched. A page blocked in robots.txt can still be indexed by URL, so use noindex (not Disallow) to keep a page out of the index.

The Robots Exclusion Protocol is how a site tells crawlers what they may fetch and what may appear in a search index. It splits across three surfaces — the robots.txt file, the HTML <meta name="robots"> tag, and the X-Robots-Tag HTTP response header — and not every directive is supported everywhere. This reference lists each directive, what it does, and which engines honour it.

How it works

Crawling and indexing are two separate stages. robots.txt is consulted before a URL is fetched, so it governs crawling only. Directives like noindex and nofollow are read after the page is fetched, from the meta tag or the X-Robots-Tag header, so they govern what happens to the content once a bot already has it.

A critical consequence: a page that is Disallow-ed in robots.txt is never fetched, so the bot never sees a noindex tag inside it. Such a page can still appear in results as a bare URL. To reliably remove a page from an index, leave it crawlable and serve a noindex directive.

Major engines extend the original 1994 standard with pattern matching: * for any character sequence and $ to anchor the URL end. These are honoured by Google and Bing but are not part of the formal RFC 9309 specification.

Tips and examples

  • To keep a page out of search, use noindex (meta or header), not Disallow.
  • Disallow: with an empty value allows everything; Disallow: / blocks the whole site for that user-agent.
  • Combine directives in one tag: <meta name="robots" content="noindex, nofollow">.
  • Use X-Robots-Tag for non-HTML files (PDFs, images) where you cannot add a meta tag.
  • Always test changes in Google Search Console’s robots.txt tester before deploying — a stray Disallow: / can deindex an entire site.