SEO Glossary

Crawl Budget

Crawl budget is the cap on how many URLs Googlebot will fetch from your site within a given time window. It is the difference between your new content getting discovered in 6 hours and 6 weeks — and the single biggest constraint for large sites.

What is Crawl Budget?

Crawl budget is the number of URLs Googlebot is willing and able to fetch from a given site within a given time window. It is shaped by two limits Google computes per host: crawl rate limit (how many parallel connections can be opened without slowing the server) and crawl demand (how badly Google wants to refresh known URLs and discover new ones).

Google publicly stated that crawl budget is mostly a non-issue for sites under 1 million URLs. Above that threshold — large e-commerce, classifieds, news archives — crawl budget directly determines how quickly new content gets discovered and how often important pages get re-crawled.

The Two Components

Crawl Rate Limit

Googlebot watches server response times and HTTP error rates. If your server is fast and stable, Google opens more parallel fetches. If it sees 5xx errors, slow TTFB, or 429 rate limits, it backs off aggressively — sometimes for days. A single bad deploy with timeouts can drop your crawl rate for a week.

Crawl Demand

Demand is driven by two signals: popularity (URLs with more inbound links and traffic get fetched more often) and staleness (URLs Google has not seen in a while get refetched periodically). A 5-year-old page with no inbound links may be crawled once every 6 months.

What Wastes Crawl Budget

Waste SourceImpactFix
Faceted navigation (filter combinations)Infinite URL variations from sort+color+size combosBlock specific params via robots.txt + canonical to clean URL
Internal duplicate content (printer-friendly, mobile m. subdomain)Multiple URLs serving same content301 redirect or canonical tag
Soft 404sPages returning 200 with no real contentReturn real 404 or noindex
Infinite calendar pagesDate archives with no end dateBlock /calendar/2030/ via robots.txt
Redirect chainsEach hop consumes a fetchCollapse to single 301
Slow server response (TTFB > 1s)Reduces rate limitCDN, caching, DB indexes

How to Monitor Crawl Budget

  1. Search Console › Settings › Crawl Stats: Shows total requests per day, average response time, file types fetched. Track week-over-week.
  2. Server log analysis: Filter for Googlebot user-agent strings. Group by URL pattern to see waste.
  3. Indexed vs total URLs ratio: If you have 100k URLs and Google indexes only 30k, the rest are crawl budget candidates for pruning.
Pro tip: Crawl budget is not a fixed pool. Improving page quality and speed expands it organically. A site that fixes 5xx errors and adds a CDN can see crawl rate double within 30 days.

Frequently Asked Questions

Do small sites need to worry about crawl budget?

Generally no. Google has stated that sites under 1 million URLs rarely face crawl budget constraints. Focus on quality and page speed instead. Above 1 million URLs, crawl budget becomes a real operational metric.

How is crawl budget different from crawl rate?

Crawl rate is one of two components of crawl budget. The other is crawl demand. Crawl rate caps how many simultaneous fetches Googlebot makes; crawl demand determines whether Googlebot wants to fetch a URL at all.

Does blocking pages in robots.txt save crawl budget?

Yes, blocked URLs are not fetched, so they consume zero budget. But blocked URLs can still appear in Google's index as 'URL submitted but not selected' entries if they have backlinks.

Can I increase crawl budget?

Indirectly. Improving server response time, reducing error rates, gaining inbound links, and refreshing content all increase crawl rate and demand. There is no direct dial.

What is a healthy indexed-to-total URL ratio?

For most content-driven sites, 70-90% of submitted URLs should be indexed. Below 50% suggests significant crawl waste or quality issues that warrant a content audit.

Related Terms & Resources

Part of the PositiveBacklink SEO Glossary. Updated May 2026.