Paste your robots.txt and a list of URLs to instantly see which paths are blocked or allowed for any user-agent. Uses Google's longest-match precedence rule with full wildcard (*) and end-of-string ($) support — same as RFC 9309. Runs 100% in your browser.
Real crawlers do not just read robots.txt top-to-bottom. They follow a 3-stage algorithm defined by RFC 9309 (formerly Google's original spec):
Crawlers pick the User-agent line that matches their token most specifically. Googlebot-Image matches a Googlebot-Image group, then falls back to Googlebot, then to *. Once a group is selected, ALL other groups are ignored.
Every Allow/Disallow line in the selected group is compared against the URL path. The line whose path string is longest (most specific) wins. This is why Allow: /admin/public/ overrides Disallow: /admin/ for the URL /admin/public/login.
If an Allow and Disallow have exactly the same path length matching the URL, Allow wins. An empty Disallow value (Disallow:) means "allow everything" and is treated as a zero-length match.
* matches any sequence of characters (including slashes). /*.pdf matches /whitepaper.pdf and /blog/2024/report.pdf.$ anchors to end of URL. /*.pdf$ matches /file.pdf but NOT /file.pdf?download=1.* nor $, it's a prefix match. Disallow: /admin blocks /admin, /admin/, /admin-panel and everything underneath.| Mistake | Effect | Fix |
|---|---|---|
Blocking CSS/JS with Disallow: /assets/ | Googlebot cannot render the page, hurting Core Web Vitals signals and structured data parsing | Add Allow: /assets/*.css and Allow: /assets/*.js |
Using noindex directive in robots.txt | Ignored since Sep 2019 (no effect, page still indexed) | Move noindex to HTML meta tag or HTTP X-Robots-Tag |
Forgetting trailing slash in Disallow: /admin | Also blocks /admin-panel, /administrators.html | Use Disallow: /admin/ or Disallow: /admin$ |
| Case-sensitive paths | Disallow: /Private/ does NOT block /private/ | Paths are case-sensitive — match URL casing exactly |
| Whitespace before directive | Crawlers may ignore indented lines | Keep User-agent:, Allow:, Disallow: at column 1 |
Robots.txt is a crawling directive, not an indexing directive. URLs blocked by robots.txt can still appear in search results if they receive external links — Google just shows them without a snippet (the famous "no information is available for this page" SERP entry).
To remove URLs from the index, use the noindex meta tag or X-Robots-Tag HTTP header. To prevent both crawling AND indexing, you need authentication or a separate sitemap-based removal request.
Google's longest-match rule per RFC 9309. When a path matches multiple Allow and Disallow lines, the one with the longest path string wins. Ties default to Allow.
Yes. The asterisk (*) matches any sequence of characters and the dollar sign ($) anchors to end of URL, exactly as Google and Bing implement them.
Any user-agent token. The tester picks the most specific User-agent group that matches, falling back to the wildcard (*) group if no specific match exists — identical to real crawler behavior.
No. The tester runs 100% in your browser. Your robots.txt and URLs are never uploaded, stored, or logged.
So you can debug. Seeing exactly which line triggered the verdict makes it trivial to fix unintended blocks — the #1 cause of indexing problems we see in audits.