What is the difference between robots.txt and meta robots?

robots.txt controls crawling — whether a bot may request a URL at all. The meta-robots tag and X-Robots-Tag header control indexing and link following after the page is fetched. A page blocked in robots.txt can still be indexed by URL, so use noindex (not Disallow) to keep a page out of the index.

Does noindex work in robots.txt?

No. Google dropped support for an unofficial Noindex directive in robots.txt in 2019. Use a meta-robots noindex tag or an X-Robots-Tag: noindex HTTP header instead, and make sure the page is crawlable so the directive can be read.

Do all search engines support crawl-delay?

Bing and Yandex honour the Crawl-delay directive in robots.txt; Google ignores it and instead exposes crawl-rate controls in Search Console. Always check the specific engine before relying on it.

What does the wildcard * mean in robots.txt?

In User-agent lines, * matches all crawlers. In path patterns (a Google/Bing extension, not the original standard), * matches any sequence of characters and $ anchors the end of the URL, so Disallow: /*.pdf$ blocks all PDF URLs.

Can I block one bot but allow others?

Yes. Each User-agent block applies only to the matching crawler. A bot uses the most specific matching User-agent group, so a named group for Googlebot overrides the catch-all * group for that bot.

What is the Robots Directives Reference?

Searchable reference for robots.txt rules and HTML meta-robots / X-Robots-Tag directives, with per-crawler support notes for Google, Bing, and other engines. Understand crawl vs. index control. It runs free in your browser on Gera Tools, with nothing uploaded.

Robots Directives Reference

Name: Robots Directives Reference
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

The Robots Exclusion Protocol is how a site tells crawlers what they may fetch and what may appear in a search index. It splits across three surfaces — the robots.txt file, the HTML <meta name="robots"> tag, and the X-Robots-Tag HTTP response header — and not every directive is supported everywhere. This reference lists each directive, what it does, and which engines honour it.

How it works

Crawling and indexing are two separate stages. robots.txt is consulted before a URL is fetched, so it governs crawling only. Directives like noindex and nofollow are read after the page is fetched, from the meta tag or the X-Robots-Tag header, so they govern what happens to the content once a bot already has it.

A critical consequence: a page that is Disallow-ed in robots.txt is never fetched, so the bot never sees a noindex tag inside it. Such a page can still appear in results as a bare URL. To reliably remove a page from an index, leave it crawlable and serve a noindex directive.

Major engines extend the original 1994 standard with pattern matching: * for any character sequence and $ to anchor the URL end. These are honoured by Google and Bing but are not part of the formal RFC 9309 specification.

The three surfaces and when to use each

robots.txt

Use for crawl-budget management and to keep bots away from entire directory trees (staging areas, internal search results, account dashboards). It is the right tool when you want to stop a crawler from even fetching the content — reducing server load or preventing raw database query strings from being crawled.

meta robots tag

Use for indexing control on individual HTML pages. A <meta name="robots" content="noindex"> tag tells any crawler that fetched the page not to include it in the index. Because the tag lives in the HTML, it is per-page and requires the bot to fetch the page first.

X-Robots-Tag HTTP header

Use for non-HTML assets — PDFs, images, documents — where you cannot add an HTML meta tag. The header is served with the file and instructs bots in the same way the meta tag does for HTML.

Common directive combinations

Goal	Correct approach
Block a page from search results	`noindex` in meta tag or X-Robots-Tag; keep the page crawlable
Stop crawlers fetching a folder	`Disallow: /folder/` in robots.txt
Block all links on a page being followed	`nofollow` in meta tag
Prevent a PDF from being indexed	X-Robots-Tag: noindex in the HTTP response
Keep a page crawlable but exclude its images	`noimageindex` in meta tag

Tips and examples

To keep a page out of search, use noindex (meta or header), not Disallow.
Disallow: with an empty value allows everything; Disallow: / blocks the whole site for that user-agent.
Combine directives in one tag: <meta name="robots" content="noindex, nofollow">.
Use X-Robots-Tag for non-HTML files (PDFs, images) where you cannot add a meta tag.
Google dropped support for an unofficial noindex directive in robots.txt in 2019. Using it there no longer works.
Always test changes in Google Search Console’s robots.txt tester before deploying — a stray Disallow: / can deindex an entire site overnight.