How one HTTP header can feed Googlebot a stale, fragmented copy of your page.

By Search AgencyJun 26, 20266 min read

// share

// table_of_contents▸

1.The fragmentation math
2.Why this is an SEO problem, not just a performance one
3.The Cookie trap is worse
4.What Google actually does with the Vary User-Agent header
5.The cache-control half people skip
6.A reference for the common Vary values
7.How to audit it this week

One response header decides how many separate copies of each URL your CDN keeps. Set it carelessly and a single page becomes hundreds of cache entries, your hit rate collapses, and Googlebot starts receiving slow, inconsistent, sometimes stale HTML that it then tries to index. The header is Vary, and almost nobody in SEO audits it.

Vary is the instruction a cache reads to build its cache key. Per MDN's reference, it lists the request headers a cache must take into account before deciding whether a stored response can be reused. Vary: Accept-Encoding means the cache keeps a gzip copy and a Brotli copy separately, which is correct and cheap, because there are only a few encodings. The trouble starts when the thing you vary on has thousands of possible values.

The fragmentation math

Vary: User-Agent tells the cache to store a distinct copy of the URL for every distinct User-Agent string it sees. User-Agent strings are very nearly unbounded. Every browser version, OS build, device model, and bot produces a different string, which is why caching guides warn that varying on User-Agent shatters your cache into near-duplicate entries. One URL becomes a separate cached object for Chrome on Windows, Chrome on Android, Safari on iOS, last month's Safari on iOS, Googlebot, and a long tail of everything else.

Two things break at once. Your cache hit ratio drops, because each new visitor variant is a cache miss that goes to origin, so origin load climbs and time to first byte rises for everyone. And Googlebot, which sends its own User-Agent, gets pushed into its own cache partition. Its requests are more likely to miss the cache, more likely to hit a cold origin, and more likely to receive a copy generated at a different moment than the one users see. Inconsistent snapshots across crawl sessions are how you get indexation instability and stale snippets in the results.

Why this is an SEO problem, not just a performance one

Crawl efficiency is sensitive to response time. Slow time to first byte means Googlebot fetches fewer URLs per crawl session, and a fragmented cache makes Googlebot's fetches disproportionately slow because its variant is rarely warm. You are spending crawl budget on cache misses.

It gets worse downstream of the fetch. Google's own JavaScript documentation notes that its Web Rendering Service caches resources aggressively and "may ignore caching headers," which means a stale JavaScript or CSS bundle can persist in Google's pipeline on top of whatever your CDN served. Combine a fragmented CDN cache that hands Googlebot an old HTML variant with a render service holding an old script, and the rendered page Google indexes can be two layers of stale away from what your users actually get.

If Vary: User-Agent fragments your cache, Vary: Cookie detonates it. Cookies are per-session and frequently per-user, so varying on Cookie can create a unique cache key for nearly every visitor. The page becomes effectively uncacheable at the shared layer, every request falls through to origin, and you have turned a CDN into an expensive passthrough. There is a correctness risk on top of the performance one. If a personalized or logged-in variant ever gets cached and served to Googlebot, you can leak the wrong content into the index. Most sites that ship Vary: Cookie did not mean to. It arrives by default from an analytics or session middleware and never gets reviewed.

What Google actually does with the Vary User-Agent header

This is the part that surprises people. Google does not use Vary: User-Agent as a signal to figure out your mobile versus desktop versions for indexing. John Mueller has been explicit that the header is not something Google made up for SEO, it is a general networking mechanism so that every cache in the path can recognize there is different content at the same URL. So adding it does not help you rank, and removing it does not hurt rankings on a responsive site.

There is exactly one case where you should keep it. If you do dynamic serving, where the same URL returns genuinely different HTML to mobile and desktop based on User-Agent sniffing, then Google recommends Vary: User-Agent so that intermediary caches do not serve the desktop HTML to a mobile user or the reverse. The header is protecting cache correctness for a setup that is itself fragile under mobile-first indexing. If your site is responsive, one HTML for all devices with CSS doing the adapting, you are serving the same bytes to everyone and Vary: User-Agent buys you nothing but a shredded cache. Drop it.

The cache-control half people skip

Vary decides how many copies exist. Cache-Control decides how long each copy is served, and the directive that matters for crawlers is the shared-cache one. Per MDN, max-age sets the lifetime for private caches like the user's browser, while s-maxage sets it specifically for shared caches like your CDN. Googlebot fetches through that shared layer. If your s-maxage is long, the CDN can keep serving Googlebot a cached HTML copy well after you have published changes, and the crawler sees stale content until the entry expires or is purged. Setting s-maxage deliberately, and purging on publish, is what keeps crawl freshness under your control rather than your CDN's default.

For static assets, the fix Google names directly is content fingerprinting, filenames like main.2bb85551.js whose hash changes when the file changes. Fingerprinted assets can be cached forever safely, because an update produces a new filename and a guaranteed fresh fetch, which sidesteps the WRS stale-resource problem entirely.

A reference for the common Vary values

Header	What it keys the cache on	Cache effect	SEO risk
`Vary: Accept-Encoding`	Compression method, a handful of values	Tiny, correct fragmentation	None. Keep it.
`Vary: User-Agent`	Every distinct UA string, thousands	Severe fragmentation, low hit rate	Slow crawler fetches, inconsistent snapshots. Only justified for true dynamic serving.
`Vary: Cookie`	Per-session, often per-user	Near-total, page barely cacheable	Origin overload plus risk of serving personalized HTML to Googlebot.
No `Vary` on HTML	One entry per URL	Maximum hit rate	Correct default for a responsive site.

How to audit it this week

Start by reading your headers. Request your top page templates and look at what comes back, either with curl -I https://example.com/ or the Network tab in DevTools. Flag any HTML response carrying Vary: User-Agent or Vary: Cookie.

Then ask the only question that decides the fix. Do you actually serve different HTML per device or per cookie at that URL? If the site is responsive and the HTML is identical for everyone, the Vary value is pure overhead and should be removed so the cache collapses back to one entry per URL. If you genuinely run dynamic serving, keep Vary: User-Agent but move the cache key normalization into your CDN so it buckets the thousands of UA strings into a few device classes rather than keying on the raw string. Most modern CDNs support this, and the cleaner pattern, used in AWS's bot-aware CDN setup, is to detect crawlers out of band so you never put User-Agent in the cache key at all.

Finally, set s-maxage on your HTML with intent, purge the CDN on publish, and fingerprint your static assets. Then watch the average response time in your Search Console crawl stats, and check your server logs to confirm Googlebot is getting the same HTML across fetches rather than a different cached variant each time.

The goal is one warm, fresh copy of each URL that every client, Googlebot included, hits consistently. Vary is how you accidentally end up with hundreds of cold ones.

If your crawl stats show slow average response times or your snippets keep going stale, your cache configuration is a good place to look, and it rarely gets audited. Our free AI Visibility Audit covers how search engines and AI assistants fetch and see your pages, including the delivery layer most SEO reviews skip. Talk to us if you want a look at what your CDN is handing the crawlers.

// want_this_for_your_brand

See where your brand stands in AI answers today, benchmarked against your competitors, no pitch required.

[ request_an_audit → ]

How one HTTP header can feed Googlebot a stale, fragmented copy of your page.

The fragmentation math

Why this is an SEO problem, not just a performance one

What Google actually does with the Vary User-Agent header

The cache-control half people skip

A reference for the common Vary values

How to audit it this week

Most of what AI cites about your brand lives on sites you do not control

The five graphs an AI assistant reads before it decides to cite you

What AI knows about your brand is decided first on your own domain

How one HTTP header can feed Googlebot a stale, fragmented copy of your page.

The fragmentation math

Why this is an SEO problem, not just a performance one

The Cookie trap is worse

What Google actually does with the Vary User-Agent header

The cache-control half people skip

A reference for the common Vary values

How to audit it this week

Most of what AI cites about your brand lives on sites you do not control

The five graphs an AI assistant reads before it decides to cite you

What AI knows about your brand is decided first on your own domain