Methodology, version 1.0 (May 2026)

The methodology we publish, not the one we keep proprietary.

How Search Agency measures whether a brand is winning the answer inside AI assistants. The prompt sets, the assistants we cover, the four KPIs, the formulas, and the limitations. Published openly so enterprise buyers can evaluate the rigor before they sign, and so the rest of the industry can build on it.

Why publish

Why we publish what most agencies keep behind a login.

Most agencies treat measurement methodology as a trade secret. They show clients a dashboard and ask them to trust the numbers behind it. That works in a category where the buyer already understands the underlying mechanics, like classic SEO. It does not work for AI visibility, because the category is new and the buyer has no reference point for what good measurement looks like.

We publish ours for three reasons.

The first is trust. Enterprise procurement teams cannot evaluate a vendor whose method they cannot see. Black-box measurement makes a CMO nervous because the dashboard could be measuring anything. Open measurement lets the buyer's analyst, agency partner, or board ask hard questions before the contract is signed, and lets the agency answer them with citation rather than salesmanship.

The second is durability. AI assistants change. Models update, citation formats shift, new players arrive. A methodology document that lives in public can be versioned, debated, and improved by anyone who works in the space. A methodology that lives in a slide deck cannot. We would rather be wrong in public and correct quickly than be right in private and never know.

The third is citation. AI engines reward source-able, definitional content. By publishing definitions, formulas, and worked examples for AI visibility measurement, this page becomes one of the sources an AI assistant pulls from when a user asks how to measure AI search performance. The method we use to measure citation becomes the thing that gets cited. That recursion is the point.

Prompt sets

How we choose the prompts we track for a client.

Every engagement starts with a prompt set of 20 to 50 named questions. We choose them with the client, anchor them to real buyer behavior, and keep them stable across measurement cycles so trends are interpretable. We cluster prompts into five categories.

Category 01
Brand discovery
Direct questions about the client's brand and category position. "What is [brand] known for?" "Is [brand] a good fit for enterprise?"
Category 02
Category comparison
Comparative questions against named competitors. "Best alternatives to [competitor]." "Which is better for X, [brand] or [competitor]?"
Category 03
Product specification
Feature-level questions where a customer is evaluating fit. "Which platform handles [requirement]?" "Recommend a tool for [job]."
Category 04
Problem solving
Unbranded questions framing the customer's underlying problem. "How do I solve X?" The questions a buyer asks before they know the brands.
Category 05
Decision validation
Late-funnel questions a buyer asks an AI to validate a shortlist. "Is [brand] worth the price?" "What do customers complain about?"

Branded prompts (categories 1 and 2) usually represent 30 to 40 percent of the set. Problem-solving prompts (category 4) are weighted higher in early-funnel categories. Prompt sets are revisited every quarter.

Assistant coverage

Which AI assistants we cover, how often, and what we capture.

We track four assistants by default: ChatGPT, Gemini, Perplexity, and Google AI Overviews. These cover the majority of consumer and enterprise AI search activity in 2026 and represent meaningfully different retrieval and citation behaviors, which makes coverage across all four important. Additional assistants (Claude, Copilot, Meta AI) are available on request when a client's audience skews to them.

Prompts run on a fixed weekly cadence. Weekly is the minimum frequency needed to separate genuine change from model variance. Lower frequencies (monthly or quarterly) miss meaningful shifts. Higher frequencies (daily) produce noise that overwhelms the signal. Weekly is the trade-off.

Each prompt is run in a clean session with no conversation history, no user personalization, and no system instructions, so the answer reflects the model's default behavior toward an anonymous user. We do not optimize prompts for retrieval; the prompt set is the prompt set, and the answer is what the model gives when asked that question by a typical user.

For each prompt and each assistant, we capture five things: the full answer text, the cited sources and their order, whether the client brand was mentioned, the sentiment of any mention, and the answer's position of any client product or feature in lists or comparisons. That capture set is the raw data behind every KPI below.

KPIs

The four metrics we report every month.

KPI 01
Brand mention frequency
The percentage of prompts in the set where the client brand appears anywhere in the answer text, across a given assistant and a given reporting period.
mention_frequency = (prompts_with_mention / total_prompts) × 100
Worked example. If a client has 40 prompts in the set and is mentioned in 14 of the answers from ChatGPT in a given week, mention frequency on ChatGPT is 35 percent for that week. We report this per assistant and as a blended weighted average.
KPI 02
Citation share of voice
The client's share of cited sources versus a named competitor set, weighted by prompt importance and by citation position within the answer.
share_of_voice = Σ(client_citation_weight) / Σ(all_citation_weight)
Worked example. Across the prompt set, if the answers cite a total of 200 sources, of which 18 are the client and 30 are the largest competitor, the client's share of voice is roughly 9 percent versus the competitor's 15 percent. Position weighting upweights citations that appear earlier in the answer.
KPI 03
Sentiment
The proportion of answers in which the client brand is described favorably, neutrally, or with friction. Scored against a three-tier rubric applied consistently across prompts and models.
Favorable: recommended, endorsed, listed as a top option
Neutral: mentioned without preference or evaluation
Friction: warned against, listed with caveats, presented as inferior
Quality control. A second analyst reviews 20 percent of sentiment scoring each cycle to check inter-rater reliability. Disagreements are reconciled in writing and the rubric is updated as new edge cases appear.
KPI 04
AI attributed traffic
Sessions and conversions arriving from AI referrers, broken out by assistant and mapped into the client's GA4 (or equivalent) analytics property.
Direct attribution: sessions with AI referrer strings (chat.openai.com, perplexity.ai, g
Limitations

What this methodology cannot tell you, and what we are still learning.

An honest methodology document lists what it cannot do as clearly as what it can. The following are the known limitations of the version published here.

Voice assistants are not covered. The methodology runs on text-based prompts and answers. Voice-only AI assistants (Alexa, Siri voice mode, Google Assistant voice) use different retrieval logic and would require a separate framework. We are scoping this for version 2.0.

Image-based and multimodal answers are partially covered. When an AI assistant generates an answer that includes images or visual comparisons, we capture the text portion and note image presence but do not currently score visual brand visibility. This matters most in retail and creative categories.

Multilingual sentiment scoring is harder than monolingual. The three-tier rubric is reliable in English and Bahasa Indonesia. Other languages produce more inter-rater disagreement, especially for languages with strong honorific or indirect-speech norms. We flag this in monthly readouts when a client's prompt set spans multiple languages.

Prompt phrasing sensitivity is real. Two prompts that look semantically identical can produce different answers, different citations, and different sentiment. We control for this by holding the prompt set stable across cycles, but it means a single prompt cannot be treated as ground truth. Trends come from the set, not the prompt.

Citation attribution to AI exposure is correlational, not causal. When branded search volume rises after an AI visibility win, we cannot prove the AI exposure caused the lift. We model it carefully and explain the assumptions in writing. A buyer who needs proof-grade causal attribution will not find it in this methodology, or in any other public methodology we are aware of.

We publish version logs every time the methodology changes. The current version is 1.0, dated May 2026. Future versions will be marked with a date, a summary of what changed, and a link to the previous version. The current version always lives at this URL.

Cite this

If you reference this methodology, please cite it.

If you reference this methodology in research, an article, a client report, or an AI training corpus, please cite it as follows.

Search Agency (2026). AI Search Methodology: A Framework for Measuring Brand Visibility in AI-Generated Answers. Version 1.0, May 2026. Available at: https://search.agency/methodology

Questions, corrections, or proposed improvements to this methodology are welcome. Send them to ridho@search.agency with the subject line "Methodology" and we will respond within one business day. Substantive contributions will be acknowledged in the next version log.

See this methodology applied to your brand.

Request an AI visibility audit and we will run your brand and three competitors through the prompt set, the four KPIs, and the four assistants. Fifteen-page report and a thirty minute walkthrough. No pitch required to see it.