Post-Redirect-Get for Faceted Navigation Crawl Control

post redirect get inarticle

When we pull server logs for a new ecommerce client, the first number we look at is the share of crawler hits landing on filter and sort URLs that no human would ever type or bookmark. It is usually the majority. One 2026 audit set found that four out of five ecommerce sites burned more than 60% of their Googlebot requests on exactly this kind of combinatorial junk, and the sites we inherit rarely look better. A catalog of five thousand products routinely shows a crawlable URL space in the millions once you count every filter, sort, and pagination permutation.

Faceted navigation causes this, and it is not a bug you can design away, because the feature genuinely helps shoppers narrow a large catalog. Google describes the default URL-parameter implementation in its own guidance on faceted navigation as generating "infinite URL spaces" that hurt a site in two ways. Crawlers overcrawl, because they cannot tell a useful filter combination from a useless one without fetching it first, and discovery of genuinely new pages slows down, because every request spent on ?color=green&size=tiny&sort=price is a request not spent on a product you actually want indexed. The same waste now hits Bingbot and the AI crawlers feeding ChatGPT and Perplexity, which makes filter bloat a problem across every engine that matters, not just Google.

Most teams reach for a fix that controls indexing but still pays for the crawl. Post-Redirect-Get works further down the stack, at the protocol layer, by relying on one fixed fact about how every mainstream crawler behaves. We do not use it everywhere, and the second half of this piece is about when it is the wrong tool. But on a large catalog where filter URLs already outnumber products by an order of magnitude, it is the most reliable crawl-control method we have.

The math behind the bloat

Picture a normal category, not a giant one. Eight colors, six sizes, twenty brands, five price bands, three sort orders, and twelve pages of results. The filterable combinations run into the tens of thousands before pagination, and pagination multiplies all of it. The popular formula C = 2^n, where n is the number of binary facets, is only a floor, because real facets carry several values each, so the true count is the product of every facet's value count plus its off state. A modest fifteen-attribute category with eight values per attribute crosses into the billions on paper.

The exact number is a poor planning tool though. What matters operationally is not how many URLs could exist but how many a crawler is actually willing to fetch before it gives up on the rest of your site for the day. That is the figure you find in the logs, and it is the one Post-Redirect-Get is designed to move.

Why the common fixes leak

The noindex meta tag is the reflex most teams start with, and it controls the wrong thing. It keeps a page out of the index, but a crawler has to fetch the URL and render the HTML before it can read the directive, so the crawl budget is already gone by the time indexing is suppressed. You end up with clean index coverage and a crawl bill you were trying to avoid.

A robots.txt disallow makes the opposite trade and is the right instinct, which is why Google recommends it as the primary tool when you do not need filter URLs indexed. The catch is that a blocked URL can still surface in search as a bare link with no snippet when other pages link to it, and any internal or external link equity pointing at a disallowed URL has nowhere to flow. Re:signal's Kevin Gibbons makes the same point in Search Engine Journal, where he calls robots.txt "more of a polite request than a strict rule" and notes that linked pages can still get picked up.

A rel="canonical" tag is a consolidation hint rather than a crawl control, and Google's documentation is candid that canonical and rel="nofollow" are "generally less effective in the long term" than blocking, with any reduction in crawl volume on the non-canonical variants arriving slowly if at all. The crawler keeps fetching the parameter-bloated versions in the meantime to confirm they still point home.

AJAX filtering looks like the clean answer because it updates the page in the browser without a full reload, and it does keep URLs out of the crawl right up until the moment your framework writes filter state into the address bar with pushState. At that point you have manufactured a real crawlable link, and since every major engine renders JavaScript, a client-side approach that leaks URLs leaks them to bots fully capable of reading them. State kept after the # as a fragment is genuinely ignored, but most modern filter interfaces do not stop at fragments. None of these four is wrong exactly, they each leave a gap, and Post-Redirect-Get closes the gap by never creating the crawlable URL in the first place.

The crawler behavior the pattern relies on

Search crawlers fetch links with HTTP GET and do not submit POST forms as a matter of routine. Google has restricted itself to GET for crawling since it first experimented with form crawling in 2008, and the rare exception, where it types a keyword into something that looks like a search box, is limited to GET forms with a couple of fields and no personal-information signals. Bingbot behaves the same way, fetching pages, images, and assets over GET, and the AI crawlers do too. POST submissions are simply not part of how any of them discover content.

That single fact is the whole foundation. If a filter is triggered by a POST request, the crawler never sees the destination, because it never sends the request that would reveal it. The filter pathway becomes invisible to discovery without a robots.txt line, a noindex tag, or any directive the crawler has to spend a fetch to read.

The four-step lifecycle

Post-Redirect-Get turns one filter click into a short, deliberate sequence of HTTP exchanges. Walk through a single color-filter click, start to finish.

The facet controls are real form elements styled to look like ordinary links or checkboxes, and they submit with method="post", so the selected parameters travel in the request body rather than the URL.

<form method="post" action="/category/running-shoes/filter">
  <button name="color" value="green">Green</button>
  <button name="size" value="42">Size 42</button>
</form>

The server reads the filter criteria, works out the canonical destination, and replies with an HTTP 303 See Other whose Location header points at the clean filtered view.

HTTP/1.1 303 See Other
Location: /category/running-shoes/green/42

The browser obeys that redirect automatically and issues a fresh GET for the target, which is the step the next section is about. The server then returns the filtered category page with a 200 OK under a clean, shareable, bookmarkable URL, and the shopper gets a normal page they can copy, link, and reload. A crawler watching the whole exchange would only ever have seen the final GET target, never the POST that produced it, so the user experience stays standard browser routing with a full page, a real URL, and a working back button while the crawl-control work happens entirely below the surface.

Why 303 specifically and not 302 or 307

The redirect code is not interchangeable, and picking the wrong one breaks the pattern. Per the HTTP specification documented by MDN, a 303 response changes the request method to GET regardless of what triggered it, which is exactly the behavior you want after a POST, because the browser drops the POST body and fetches the destination cleanly. A 307 does the reverse and preserves the original method and body, so the browser would re-POST to the destination and could resubmit on refresh, which defeats the purpose. A 302 has been ambiguous historically, with clients differing on whether they switch to GET, and 303 was added in HTTP/1.1 precisely to remove that ambiguity. RFC 9110 carries the current wording, so the rule is short. Use 303.

The implementation rules that decide whether it holds

The pattern itself is simple, and making it behave on a real catalog comes down to a few rules we apply every time.

Set an inventory threshold before any facet combination is allowed to become an indexable page. A filtered view backed by two products is thin content that earns nothing and risks a soft 404, so we keep a floor of at least three active products before a combination gets its own clean crawlable URL, and anything below the floor stays behind POST.

Decide which facets deserve to be found at all, because not every filter should be hidden. Brand, product type, and popular attributes attract real search demand and belong at clean, crawlable, indexable paths, while sort order, availability, and oddball value mixes attract nobody and belong behind the POST wall. Gibbons sorts facets into the same three buckets in his SEJ piece, index, noindex, and block, and Post-Redirect-Get is how you enforce the block bucket without spending crawl budget to do it. This decision sets your organic ceiling, so it is worth more care than the engineering that follows it.

For the facets you do want indexed, rewrite the URLs into clean readable paths, so /running-shoes/green/size-42 rather than ?color=green&size=42. Google asks for the standard & separator when you must use parameters and for a consistent filter order so that /green/42 and /42/green never resolve to two URLs for one set, so pick an order, enforce it server-side, and keep it stable.

Handle the empty set honestly. When a filter combination returns nothing, Google's guidance is explicit that you serve a real 404 at that URL rather than a 200 with a "no results" message or a redirect to a generic error page, and the same goes for nonsensical or duplicate combinations. A clean 404 tells the crawler the door leads nowhere and to stop trying it.

Where it is the wrong tool

Post-Redirect-Get is a crawl-control technique with real costs, and we talk clients out of it as often as we recommend it. The extra redirect hop adds a round trip to every filter action, invisible on a fast stack and noticeable on a slow one, so the filtered GET target has to be quick and cacheable or the whole interface feels sluggish. Accessibility needs attention too, because form-driven filtering has to degrade to something operable for users on assistive technology or without JavaScript, which means the controls must be genuine semantic forms and not divs wired up with click handlers.

Analytics shifts as well, since a POST-then-redirect flow does not register as a normal link click, so filter-usage tracking has to hook into the form submission or the resulting pageview rather than anchor clicks. And because the pattern deliberately keeps combinations out of the crawl, you forfeit the chance to rank the long-tail combinations you hid, which is the right call for price=low&sort=newest and the wrong call for nike-running-shoes. That is why the indexable-path decision earlier matters so much, because the pattern will enforce whatever you choose, good judgment or bad.

For a small catalog with a handful of filters, none of this engineering is worth it, and Google itself frames active crawl-budget management as a concern for large or fast-changing sites. Reach for robots.txt first when you simply need the bleeding stopped today. Post-Redirect-Get earns its complexity on the large ecommerce property where filtered URLs already swamp the product pages, and there the difference shows up as Googlebot and Bingbot finding your new products this week instead of next month.

Proving it works

Ship the pattern, then prove it from the logs, because log file analysis is the only ground truth here. Pull your server logs, filter to each crawler in turn, and confirm the parameter and POST-target patterns have dropped out of the crawl while your clean indexable paths are still being fetched. If a bot is still hammering filtered URLs, something is leaking, almost always an internal link or a sitemap entry still pointing at a URL the POST flow was supposed to bury.

Back that up with a crawl of your own. Run Screaming Frog against the site and confirm the facet controls do not surface as followable links, since a POST button should not appear in the outlink report the way an anchor does. Then watch the discovered-URL counts fall over the following weeks in both Google Search Console and Bing Webmaster Tools, and run a site: query on each engine to confirm the parameter junk is leaving the index rather than entering it. Bing Webmaster Tools matters more than teams assume now, because the Bing index is what grounds ChatGPT, so a filter mess that confuses Bingbot also costs you AI visibility on top of crawl budget.

How it compares to the alternatives

Consideration	Robots.txt block	AJAX navigation	Post-Redirect-Get
Crawl budget	Stops the crawl, but equity into blocked URLs is wasted	Reduces crawling, leaks once filter state becomes a real URL	Filter pathway never enters the crawl at all
Indexation control	Blocked URLs can still appear as bare links	Strong while state stays in fragments only	Only clean GET targets are eligible, so control is full
Link equity	Equity into disallowed URLs goes nowhere	Internal link paths preserved	Consolidated onto the indexable paths you choose
User experience	Standard browser routing	Zero-reload dynamic refresh	Standard routing with bookmarkable, shareable URLs
Effort to ship	Lowest, a few lines	Moderate, depends on the framework	Highest, server-side forms and redirects

No method wins every row. Robots.txt is the fastest thing to ship and the right tool when you need waste stopped today and have nothing else in place. AJAX gives the smoothest experience as long as it is built to keep state out of the URL. Post-Redirect-Get is the one that buys protocol-level certainty about what a crawler can and cannot reach, at the cost of more engineering up front, and on a catalog where filter URLs already outnumber products it is the trade we make almost every time.

If filter bloat is eating your crawl across Google and Bing, see where your site stands in an audit or talk to us about implementing the POST wall without losing the facets that actually earn search traffic.