Robots.txt Tester

Paste your robots.txt and a list of URLs to instantly see which paths are blocked or allowed for any user-agent. Uses Google's longest-match precedence rule with full wildcard (*) and end-of-string ($) support — same as RFC 9309. Runs 100% in your browser.

// INPUT

// RESULTS

Enter robots.txt and URLs, choose a user-agent, then click Test URLs to see per-URL verdicts.

How the Robots.txt Matching Algorithm Works

Real crawlers do not just read robots.txt top-to-bottom. They follow a 3-stage algorithm defined by RFC 9309 (formerly Google's original spec):

1. Select the most specific User-agent group

Crawlers pick the User-agent line that matches their token most specifically. Googlebot-Image matches a Googlebot-Image group, then falls back to Googlebot, then to *. Once a group is selected, ALL other groups are ignored.

2. Among Allow and Disallow lines, find longest match

Every Allow/Disallow line in the selected group is compared against the URL path. The line whose path string is longest (most specific) wins. This is why Allow: /admin/public/ overrides Disallow: /admin/ for the URL /admin/public/login.

3. Tie-breaker: Allow wins

If an Allow and Disallow have exactly the same path length matching the URL, Allow wins. An empty Disallow value (Disallow:) means "allow everything" and is treated as a zero-length match.

Wildcard and End-of-String Rules

Common Mistakes This Tester Catches

MistakeEffectFix
Blocking CSS/JS with Disallow: /assets/Googlebot cannot render the page, hurting Core Web Vitals signals and structured data parsingAdd Allow: /assets/*.css and Allow: /assets/*.js
Using noindex directive in robots.txtIgnored since Sep 2019 (no effect, page still indexed)Move noindex to HTML meta tag or HTTP X-Robots-Tag
Forgetting trailing slash in Disallow: /adminAlso blocks /admin-panel, /administrators.htmlUse Disallow: /admin/ or Disallow: /admin$
Case-sensitive pathsDisallow: /Private/ does NOT block /private/Paths are case-sensitive — match URL casing exactly
Whitespace before directiveCrawlers may ignore indented linesKeep User-agent:, Allow:, Disallow: at column 1

What Robots.txt Cannot Do

Robots.txt is a crawling directive, not an indexing directive. URLs blocked by robots.txt can still appear in search results if they receive external links — Google just shows them without a snippet (the famous "no information is available for this page" SERP entry).

To remove URLs from the index, use the noindex meta tag or X-Robots-Tag HTTP header. To prevent both crawling AND indexing, you need authentication or a separate sitemap-based removal request.

Frequently Asked Questions

Which precedence rule does this tester follow?

Google's longest-match rule per RFC 9309. When a path matches multiple Allow and Disallow lines, the one with the longest path string wins. Ties default to Allow.

Does it support wildcards and end-of-string anchors?

Yes. The asterisk (*) matches any sequence of characters and the dollar sign ($) anchors to end of URL, exactly as Google and Bing implement them.

What user-agents are supported?

Any user-agent token. The tester picks the most specific User-agent group that matches, falling back to the wildcard (*) group if no specific match exists — identical to real crawler behavior.

Does the tool send data anywhere?

No. The tester runs 100% in your browser. Your robots.txt and URLs are never uploaded, stored, or logged.

Why is the matched rule shown for each URL?

So you can debug. Seeing exactly which line triggered the verdict makes it trivial to fix unintended blocks — the #1 cause of indexing problems we see in audits.

Related Tools & Reading