Robots.txt Tester

Paste your robots.txt and a list of URLs to instantly see which paths are blocked or allowed for any user-agent. Uses Google's longest-match precedence rule with full wildcard (*) and end-of-string ($) support — same as RFC 9309. Runs 100% in your browser.

How the Robots.txt Matching Algorithm Works

Real crawlers do not just read robots.txt top-to-bottom. They follow a 3-stage algorithm defined by RFC 9309 (formerly Google's original spec):

1. Select the most specific User-agent group

Crawlers pick the User-agent line that matches their token most specifically. Googlebot-Image matches a Googlebot-Image group, then falls back to Googlebot, then to *. Once a group is selected, ALL other groups are ignored.

2. Among Allow and Disallow lines, find longest match

Every Allow/Disallow line in the selected group is compared against the URL path. The line whose path string is longest (most specific) wins. This is why Allow: /admin/public/ overrides Disallow: /admin/ for the URL /admin/public/login.

3. Tie-breaker: Allow wins

If an Allow and Disallow have exactly the same path length matching the URL, Allow wins. An empty Disallow value (Disallow:) means "allow everything" and is treated as a zero-length match.

Wildcard and End-of-String Rules

* matches any sequence of characters (including slashes). /*.pdf matches /whitepaper.pdf and /blog/2024/report.pdf.
$ anchors to end of URL. /*.pdf$ matches /file.pdf but NOT /file.pdf?download=1.
If a path has neither * nor $, it's a prefix match. Disallow: /admin blocks /admin, /admin/, /admin-panel and everything underneath.

Common Mistakes This Tester Catches

Mistake	Effect	Fix
Blocking CSS/JS with `Disallow: /assets/`	Googlebot cannot render the page, hurting Core Web Vitals signals and structured data parsing	Add `Allow: /assets/.css` and `Allow: /assets/.js`
Using `noindex` directive in robots.txt	Ignored since Sep 2019 (no effect, page still indexed)	Move noindex to HTML meta tag or HTTP X-Robots-Tag
Forgetting trailing slash in `Disallow: /admin`	Also blocks `/admin-panel`, `/administrators.html`	Use `Disallow: /admin/` or `Disallow: /admin$`
Case-sensitive paths	`Disallow: /Private/` does NOT block `/private/`	Paths are case-sensitive — match URL casing exactly
Whitespace before directive	Crawlers may ignore indented lines	Keep `User-agent:`, `Allow:`, `Disallow:` at column 1

What Robots.txt Cannot Do

Robots.txt is a crawling directive, not an indexing directive. URLs blocked by robots.txt can still appear in search results if they receive external links — Google just shows them without a snippet (the famous "no information is available for this page" SERP entry).

To remove URLs from the index, use the noindex meta tag or X-Robots-Tag HTTP header. To prevent both crawling AND indexing, you need authentication or a separate sitemap-based removal request.

Frequently Asked Questions

Which precedence rule does this tester follow?

Google's longest-match rule per RFC 9309. When a path matches multiple Allow and Disallow lines, the one with the longest path string wins. Ties default to Allow.

Does it support wildcards and end-of-string anchors?

Yes. The asterisk (*) matches any sequence of characters and the dollar sign ($) anchors to end of URL, exactly as Google and Bing implement them.

What user-agents are supported?

Any user-agent token. The tester picks the most specific User-agent group that matches, falling back to the wildcard (*) group if no specific match exists — identical to real crawler behavior.

Does the tool send data anywhere?

No. The tester runs 100% in your browser. Your robots.txt and URLs are never uploaded, stored, or logged.

Why is the matched rule shown for each URL?

So you can debug. Seeing exactly which line triggered the verdict makes it trivial to fix unintended blocks — the #1 cause of indexing problems we see in audits.

Related Tools & Reading

Meta Tag Generator — build SEO meta tags with live SERP preview
JSON-LD Schema Validator — validate structured data markup
SERP Snippet Preview — see how your title/description appear in Google
Crawl Budget glossary — how Googlebot allocates crawl resources

// INPUT

// RESULTS