Information Architecture for Ecommerce Category Pages

Hands typing on a laptop building an online cosmetics store in a website editor

Take any ecommerce store and rank its pages by the revenue they touch. Product pages convert, but they rank for narrow branded and model queries. The home page pulls navigational traffic that was already coming. The category page is the one asset that ranks for the high-intent commercial head term, the query a buyer types when they know what they want but not which one to buy. "Running shoes." "Air conditioner 1 PK." "Office chair ergonomic." That is where the money enters, and on most stores it is also the page nobody owns. It inherits whatever structure the platform shipped with, and that structure quietly decides whether the page ranks at all.

The reason category architecture is mishandled so often is that it sits between two teams that rarely talk. The merchandising team designs the filters for shoppers. The platform vendor designs the URL behaviour for whatever was easiest to build. Neither is thinking about how a crawler reads the result, and the result is usually one of two failure modes: a clean category page surrounded by millions of crawlable filter URLs that drown it, or a thin category page with no path for the genuine long-tail demand sitting one filter deep. Both are architecture problems, and both are fixable before a single line of content is written.

A category page is doing three jobs at once

The trap is treating the category page as one thing. It is three.

For the shopper, it is a filtering surface. They land on "women's dresses" and immediately narrow by size, occasion, and price until the grid is small enough to scan. Faceted navigation exists for them, and it is not optional; a store that buries its catalogue behind weak filters loses the sale to one that does not.

For the crawler, the same page is a hub that distributes authority. Google does not read your URL folders to understand your site; it reads the links between pages to infer which ones matter. Google's own guidance on ecommerce site structure is explicit that pages earn importance through the links pointing at them, from menus to category pages, then down to subcategories and products. The category page is the junction where that authority either flows cleanly to the pages that should rank or leaks into a maze of filter combinations that should not. The same structural clarity is what lets AI answer engines resolve which URL represents a category when they assemble a response, which is why optimising for AI search starts from the architecture rather than treating it as a separate project.

For the long-tail searcher, a slice of that same page is a landing page in waiting. "High-rise skinny jeans" is a real query with real volume, and it lives one filter deep inside the "jeans" category. Whether that slice becomes an indexable, rankable page or stays an ephemeral filter state is the single highest-leverage architecture decision on the whole template.

The work is to serve all three without letting one sabotage the others. The shopper wants every filter combination available instantly. The crawler needs almost none of them in the index. The long-tail searcher needs a specific, deliberately chosen few promoted to real pages. Architecture is how you reconcile those three demands on purpose instead of by accident.

The first move is not technical. It is research, and skipping it is why most faceted setups go wrong. Before deciding which filters generate indexable pages, you have to know which filter combinations correspond to real search demand and which do not.

The method is concrete. Take a parent category, "office chairs," and pull its keyword universe: the matching terms, the modifier patterns, the grouped parent topics. You are looking for the modifiers that buyers actually search as standalone queries. "Ergonomic office chair" has demand. "Mesh office chair" has demand. "Office chair with lumbar support" has demand. "Blue office chair under 2 million rupiah with armrests and a headrest" has none, because nobody searches a five-attribute string. The pattern holds across every catalogue: a small number of single-attribute and occasionally two-attribute filters carry almost all of the long-tail demand, and the deep combinations carry effectively zero.

This matters more than it looks because of how search demand is shaped. Ahrefs' analysis of its keyword database found that the overwhelming majority of search terms, roughly 95 percent, draw fewer than ten searches a month each, yet the long tail in aggregate is a large share of all demand. The lesson is not "chase everything." It is that the demand worth capturing through facets is the band of modifiers that have enough volume to justify a real indexable page, and your keyword map is what tells you where that band ends. Every facet above the line becomes a candidate landing page. Everything below it stays a filter state and nothing more.

Done properly, this map becomes the specification for the entire architecture. It tells the merchandising team which subcategories deserve their own URL and their own copy, and it tells the engineering team which filter patterns to allow into the index and which to wall off. Get the map right and the technical decisions downstream become mechanical.

The indexation decision that governs everything

Once you know which filter combinations have demand, every URL the navigation can generate falls into one of three buckets. The discipline is assigning each one deliberately rather than letting the platform default decide.

Promote: real demand becomes a real page

A filter combination with verified search volume should not stay a filter state. It should be a proper subcategory page with a clean, static URL, its own title and heading, and a short block of unique copy. The target shape is `/jeans/high-rise/` and never `/jeans/?fit=high-rise`. Google's ecommerce URL structure guidance favours readable, stable paths precisely because they are easier to crawl, easier to link, and easier for a human to trust in a result. These promoted pages get into the XML sitemap, get linked from the parent category body, and get treated as first-class landing pages, because that is what they are.

Contain: useful to shoppers, invisible to the index

The large middle bucket is filter combinations that genuinely help shoppers but have no standalone search demand. "Samsung silver large-capacity quick-wash" is a useful narrowing for the person buying a washing machine and a worthless page for search. These should remain fully usable in the interface while being kept out of the index. The cleanest way to achieve that, as Google's documentation on managing faceted navigation lays out, is to not generate crawlable links to them at all: render the filtering with a method that updates the listings without minting a fresh `<a href>` for every combination. No crawlable link means no crawl, no index bloat, and no authority leaking into pages that will never rank.

Block: the combinations that should never have existed

The third bucket is the combinatorial explosion, the filter-on-filter-on-filter URLs that multiply into the millions and exist only because the platform allowed them to. Sorting orders, session parameters, and stacked attribute strings belong here. If these are already crawlable on your site, robots.txt disallow rules on the parameter patterns stop the crawl from wasting itself on them. If they are not yet crawlable, the answer is simply to never link them. The goal across both contain and block is the same: the only category URLs a crawler can reach are the ones you chose on purpose.

The reason to be strict is that the cost is not neutral. Every crawlable filter URL competes for the same finite crawl attention and divides the internal authority of the page across links that lead nowhere useful. Left unmanaged, a single category template can generate hundreds of thousands of low-value URLs, and the pages you actually want ranked get a thinner and thinner slice of the site's authority.

The example

Picture a mid-sized Indonesian electronics retailer with an "air conditioner" category. The platform shipped with filters for brand, capacity (PK rating), energy efficiency, inverter type, price range, and colour, every one of them generating a crawlable parameter URL. The audit finds 140,000 indexable filter URLs against a real catalogue of 600 products. Organic traffic to the category itself is flat, and the head term "AC 1 PK" sits on page two.

The keyword map changes the picture in an afternoon. Capacity is the dominant buyer modifier: "AC 1/2 PK," "AC 1 PK," "AC 2 PK" each carry meaningful monthly demand. Inverter is a real query too. Brand-plus-capacity has demand for the major brands. Everything else, the colour filters, the price bands, the four-attribute stacks, has none.

The rebuild promotes the capacity tiers and the top brand-plus-capacity pages to clean static subcategories with their own copy, roughly twenty pages in total. It contains every other useful filter behind a non-crawlable interaction so shoppers keep them. It blocks the colour, price, and sort parameters at robots.txt. The indexable surface drops from 140,000 URLs to a few hundred deliberate ones, the parent category stops competing with its own filter noise, and the promoted capacity pages start catching the long-tail demand that was previously dissolving into uncrawled combinations. No new products, no new content budget, just authority pointed where demand already was. This is the kind of structural rework where the payoff is disproportionate to the effort, because the levers are all reallocation rather than new investment.

Where authority actually flows

The last piece is internal linking, and it is where the strategic and the tactical meet. Once the indexable set is clean, the parent category page should link to its promoted subcategories from within the body content, not only from a sidebar filter, using descriptive anchor text that names the subcategory. That body link is a far stronger signal of importance than a filter widget, and it is the mechanism by which the parent passes authority down to the pages built to rank.

This is also where category architecture stops being a purely technical concern and becomes a competitive one. A store that has mapped its demand, promoted the right twenty pages, and walled off the rest is concentrating its entire site's authority onto a small, deliberate set of commercial URLs. A competitor running the platform defaults is spreading the same authority across hundreds of thousands of junk URLs. Over a year, that difference compounds into rankings that look like a budget gap and are actually an architecture gap, the same pattern we have written about in several teardowns of how smaller teams outrank larger ones. The structure is the moat.

For enterprise teams, the takeaway to carry upstairs is simple to state and hard to argue with. Category architecture is not a UX detail to be left to the platform vendor; it is the layer that decides how much of your hard-won authority reaches the pages that earn revenue. It deserves to be specified deliberately, audited regularly, and owned by someone. If your category pages are not ranking for their head terms, the content is rarely the problem. The architecture underneath them almost always is, and if you want that layer rebuilt to point authority where your demand already sits, Search Agency works on exactly this as specialist, measurable technical SEO.