How the Post-Redirect-Get pattern hides billions of junk filter URLs from Googlebot.

By Search AgencyJun 25, 20269 min read
// share

post redirect get inarticle

A store with 32 filter parameters can generate 4,294,967,296 unique URLs. That is 2 raised to the 32nd power, and it assumes the laziest possible math, one on/off state per facet. Add a second value to any facet and the count climbs past four billion fast. Every one of those URLs is a door Googlebot has to open before it can decide the room behind it is empty.

This is the core problem with faceted navigation. The feature is good for users and ruinous for crawlers. Google says so directly in its own guidance on faceted navigation URLs, which describes the default URL-parameter implementation as generating "infinite URL spaces" that harm a site two ways. First is overcrawling, because crawlers cannot tell a useful filter combination from a useless one without fetching it. Second is slower discovery, because every request spent on `?color=green&size=tiny&sort=price` is a request not spent on a new product page.

Most teams reach for a defense that treats the symptom and leaves the disease. The Post-Redirect-Get pattern treats the disease, at the protocol layer, by exploiting one fixed fact about how crawlers behave.

The combinatorial math is worse than it looks

Take a category with these facets. Color with eight options. Size with six. Brand with twenty. Price band with five. Plus three sort orders and twelve pages of results. The filterable combinations alone run into the tens of thousands before pagination, and pagination multiplies the lot. None of this requires a large catalog. It requires a normal one.

The formula people quote is `C = 2^n`, where `n` is the number of binary facets. It is a floor, not a ceiling. Real facets carry multiple values, so the true count is the product of each facet's value count plus its off state. Useful for a quick scare number, useless for planning. The planning question is not how many URLs exist. It is how many you are willing to let a crawler fetch.

Why the usual defenses leak

The `noindex` meta tag keeps a page out of the index. It does nothing for crawling. Googlebot has to fetch the page and render the HTML to read the directive in the first place, so the crawl budget is already spent by the time the instruction is obeyed. You have controlled indexation and paid full price for the crawl you were trying to avoid.

A `robots.txt` disallow is the opposite trade. It stops the crawl, which is the right instinct, and Google's documentation recommends it as the primary tool when you do not need filter URLs indexed. But a blocked URL can still appear in search as a bare link with no snippet if other pages point to it, and any internal or external link equity flowing into a disallowed URL is equity that goes nowhere. Re:signal's Kevin Gibbons makes the same point in Search Engine Journal, noting that robots.txt is "more of a polite request than a strict rule" and that linked pages can still get picked up.

The `rel="canonical"` tag is a consolidation hint, not a crawl control. Google's own doc is candid that canonical and `rel="nofollow"` are "generally less effective in the long term" than blocking, and that the canonical may, over time, reduce crawl volume on non-canonical variants. May. Over time. Meanwhile the crawler keeps fetching the parameter-bloated versions to check.

AJAX filtering hides parameters by updating the page in the browser without a full reload. It works against crawling only when the state never becomes a real URL. The moment your framework writes filter state into the address bar with `pushState`, you have manufactured a crawlable link, and Google renders JavaScript, so a client-side approach that leaks URLs leaks them to a bot that can read them. Fragment-based state after the `#` is genuinely ignored by Google, but most modern filter UIs do not stop at fragments.

Add it up. The tag makes you pay for a crawl you were trying to skip. The block strands your link equity. The canonical is slow and only ever a hint. And the script stays safe right up until it writes its first URL. Post-Redirect-Get answers all four at the protocol layer, before any of those tradeoffs come up.

The one crawler fact PRG is built on

Googlebot fetches links with HTTP GET. It does not submit POST forms as a matter of routine. Google has restricted itself to GET requests for crawling since it first experimented with form crawling back in 2008, and the rare exception, where it plugs keywords into something that looks like a search box, is limited to GET forms with a couple of input fields and no personal-information signals. POST submissions are off the table.

That is the whole foundation. If a filter is triggered by a POST request, the crawler never sees the destination, because it never sends the request that would reveal it. The filter pathway becomes invisible to discovery without a single robots.txt line or noindex tag.

The four-step lifecycle

Post-Redirect-Get turns one filter click into a short, deliberate sequence of HTTP exchanges. Walk through a single color-filter click.

Step one, the POST. The facet controls are real form elements, styled to look like ordinary links or checkboxes, submitting with `method="post"`. The selected parameters travel in the request body, not the URL.

<form method="post" action="/category/running-shoes/filter">  
<button name="color" value="green">Green</button> 
<button name="size" value="42">Size 42</button>
</form>

Because the parameters live in the body of a POST, no crawlable URL is created and no crawler follows the action.

Step two, the 303. The server reads the filter criteria, works out the canonical destination, and replies with an HTTP `303 See Other` whose `Location` header points at the clean filtered view.

HTTP/1.1 303 See Other
Location: /category/running-shoes/green/42

Step three, the GET. The browser obeys the redirect automatically and issues a fresh GET for that target. This is the step that matters for the redirect choice, and the next section explains why 303 and not 302 or 307.

Step four, the 200. The server returns the filtered category page with a `200 OK` under a clean, shareable, bookmarkable URL. The user gets a normal page they can copy, link, and reload. The crawler, had it been watching, would only ever have seen the final GET target, never the POST that produced it.

The user experience is standard browser routing. Full page, real URL, working back button. The difference is entirely below the surface.

Why 303 and not 302 or 307

The redirect code is not interchangeable. Pick the wrong one and the pattern breaks. Per the HTTP specification documented by MDN, a 303 response changes the request method to GET regardless of the method that triggered it. That is exactly what you want after a POST. The browser drops the POST body and fetches the destination cleanly with GET.

A 307 does the opposite. It preserves the original method and body, so the browser would re-POST to the destination, which defeats the purpose and can resubmit on refresh. A 302 is ambiguous by history, with clients varying in whether they switch to GET. 303 was added in HTTP/1.1 precisely to remove that ambiguity, and it is the status code that makes Post-Redirect-Get safe against the duplicate-submission warning. RFC 9110 carries the current wording. Use 303.

Implementation details that decide whether it holds

The pattern is simple. Getting it to behave on a real catalog takes a few rules.

Set an inventory threshold before you ever expose a facet combination as an indexable page. A filtered view backed by two products is thin content that earns nothing and risks a soft 404. A reasonable floor is a minimum of three active products before a combination gets a clean crawlable URL of its own. Below the floor, it stays behind POST.

Decide which facets deserve to be found at all. Not every filter should be hidden. Brand, product type, and popular attributes attract real search demand and should live at clean, crawlable, indexable paths. Sort order, availability, and oddball value mixes attract nobody and belong behind the POST wall. Gibbons lays out the same three buckets in his SEJ piece, index, noindex, and block, and Post-Redirect-Get is how you implement the block bucket without spending crawl budget to enforce it.

For the facets you do want indexed, rewrite the URLs into clean readable paths. `/running-shoes/green/size-42` rather than `?color=green&size=42`. Google asks for the industry-standard `&` separator when you must use parameters, and for a consistent filter order so `/green/42` and `/42/green` never resolve to two URLs for the same set. Pick an order, enforce it server-side, and keep it stable.

Handle the empty set honestly. When a filter combination returns nothing, Google's guidance is explicit that you should serve a real `404` at that URL, not a 200 with a "no results" message and not a redirect to a generic error page. Same for nonsensical or duplicate filter combinations. The 404 tells the crawler the door leads nowhere and to stop trying it.

Where the pattern strains

Post-Redirect-Get is a crawl-control tool, not a religion, and a few costs come with it.

The extra redirect hop adds a round trip to every filter action. On a fast stack it is invisible. On a slow one it stacks up, so the filtered GET target needs to be quick and cacheable. There is also the question of users without JavaScript or with assistive technology, where form-driven filtering has to degrade to something operable, which means the forms must be real, semantic forms and not divs wired up with click handlers.

Analytics shifts too. A POST-then-redirect flow does not look like a normal link click, so filter usage tracking needs to be wired to the form submission or the resulting pageview rather than to anchor clicks. And because the pattern deliberately keeps combinations out of the crawl, you lose the chance to rank the long-tail combinations you hid. That is the right call for `price=low&sort=newest` and the wrong call for `nike-running-shoes`, which is why the indexable-path decision earlier is the part that actually determines your organic ceiling. Get that sort right before you build anything, because the pattern will enforce whatever you decide.

How to validate it is working

Ship it, then prove it. Log file analysis is the ground truth. Pull your server logs, filter to Googlebot, and confirm the parameter and POST-target patterns have dropped out of the crawl while your clean indexable paths are still being fetched. If Googlebot is still hammering filtered URLs, something is leaking, usually an internal link or a sitemap entry pointing at a URL the POST flow was supposed to bury.

Run a crawler such as Screaming Frog against the site and check that the facet controls do not surface as followable links. A POST button should not appear in the outlink report the way an anchor does. Then watch Search Console coverage over the following weeks for the count of discovered filter URLs to fall, and a `site:` search to confirm the parameter junk is leaving the index rather than entering it.

How it compares to the alternatives

Metric Robots.txt block AJAX navigation Post-Redirect-Get
Crawl budget Stops the crawl, but link equity into blocked URLs is wasted Reduces crawling, leaks if filter state becomes a real URL Filter pathway never enters the crawl at all
Indexation control Blocked URLs can still appear as bare links Strong while state stays in fragments only Only clean GET targets are eligible, full control
Link equity Equity into disallowed URLs goes nowhere Internal link paths preserved Consolidated onto the indexable paths you choose
User experience Standard browser routing Zero-reload dynamic refresh Standard routing with bookmarkable, shareable URLs

No single method wins on every row. Robots.txt is the fastest thing to ship and the right tool when you have nothing else and need bleeding stopped today. AJAX is the smoothest experience when it is built to keep state out of the URL. Post-Redirect-Get is the one that gives you protocol-level certainty about what a crawler can and cannot reach, at the cost of more engineering up front.

For a small catalog with a handful of filters, that engineering is overkill, and Google itself says active crawl-budget management is a large-site and fast-changing-site concern. For a large ecommerce property where filtered URLs already outnumber product pages by an order of magnitude, the POST wall is the difference between Googlebot discovering your new products this week and finding them next month.

// want_this_for_your_brand

See where your brand stands in AI answers today, benchmarked against your competitors, no pitch required.

[ request_an_audit → ]