Your next customer may be an AI agent, and it reads your schema before your homepage.

By Ridho Putradi S'GaraJul 3, 202610 min read
// share
► Listen to this post

agentic ai api first geo

Fewer than 1.2 percent of brand locations receive a direct recommendation from leading AI assistants, according to industry research cited by Capxel when it launched the LLM-LD standard in February 2026. That figure describes the citation game that generative engine optimization (GEO) and answer engine optimization (AEO) have been playing so far, where the win is being named inside an answer that a human reads and weighs. A second game has started on top of it, and this one has no human reading anything.

Agentic AI is the shift from assistants that answer to agents that act. Give an agent a task, compare these five vendors, book the cheapest compliant flight, restock a warehouse, and it works through the options and returns a decision or a finished transaction. Harvard Business Review maps three modes brands should prepare for, agents a consumer runs directly, third-party agents acting on a buyer's behalf, and agent-to-agent commerce where your systems negotiate with a customer's systems. Forrester expects agentic commerce to cut retail media ad sales by around 20 percent as purchases complete inside AI interfaces instead of on retailer sites.

Forrester's own mid-2026 read adds a useful correction, most "agentic" experiences today are still conversational and humans still drive most checkouts. The direction is what matters for planning. A citation earns an impression a person evaluates, while an agent applies a filter, and anything it cannot parse is gone before a shortlist exists.

What agentic AI does to generative engine optimization

An agent scoring five vendors queries whatever structured data it can reach, checks it against the task constraints, and drops whatever fails to parse or contradicts itself, with none of the tolerance a human reader brings to an imperfect page. Walk through a real task and the difference gets concrete. A procurement agent told to source 500 units under a landed cost ceiling will pull your product schema, your price, your availability flag, and your shipping terms, compare them against four competitors in a few seconds, and never render your homepage at all. If the price in your markup disagrees with the price on the page, the safest move for the agent is to discard you, and that is what the current generation does.

NetRanks calls the response to this the machine-readable brand, where your truth data, live pricing, availability, and technical specifications, sits in verifiable structured form an agent can query directly instead of scraping and guessing. The plumbing for those queries is no longer exotic. The Model Context Protocol, released by Anthropic in November 2024, has been adopted by OpenAI and Google and donated to the Linux Foundation, and Visa, Mastercard, Stripe, PayPal, and Shopify are building agentic payments and catalogs on top of it. Treating your API as a marketing channel stops being a metaphor at that point. The endpoints your engineers built for internal apps become the surface an outside agent reads before deciding whether you exist for its task.

Fix the schema markup you have before adopting anything new

The first project is an audit of the structured data you already publish, because most catalogs drift. Prices in the JSON-LD lag the page, availability flags stop being updated after a replatform, and spec fields get filled three different ways across a few thousand SKUs. Drift like that is invisible to a human reader, but an agent doing a structured handshake sees it immediately, and inconsistent data can be worse than missing data since the agent may quote the stale price or drop you over the mismatch between your markup and your page.

This is what an offer looks like when an agent can act on it without guessing.

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Kopi Arabika Gayo 1kg",
  "sku": "GYO-1KG",
  "gtin13": "8991234567890",
  "brand": { "@type": "Brand", "name": "Contoh Kopi" },
  "offers": {
    "@type": "Offer",
    "url": "https://example.co.id/id/kopi-arabika-gayo-1kg",
    "price": "185000",
    "priceCurrency": "IDR",
    "priceValidUntil": "2026-07-31",
    "availability": "https://schema.org/InStock",
    "itemCondition": "https://schema.org/NewCondition",
    "shippingDetails": {
      "@type": "OfferShippingDetails",
      "shippingDestination": { "@type": "DefinedRegion", "addressCountry": "ID" },
      "deliveryTime": {
        "@type": "ShippingDeliveryTime",
        "handlingTime": { "@type": "QuantitativeValue", "minValue": 0, "maxValue": 1, "unitCode": "DAY" }
      }
    }
  }
}

The fields that decide selection are exactly the ones marketing teams rarely audit. price has to match the rendered page to the digit, availability has to come from the inventory system rather than a value someone hardcoded in a template three years ago, and priceValidUntil has to be a date your pricing team actually honors. When your platform cannot keep a field live, remove it, because an absent field costs you less with an agent than a wrong one. Google's Merchant Center has enforced this discipline on product feeds for years, and the practical move is to treat feed-grade accuracy as the bar for every piece of JSON-LD on the domain.

LLM-LD wants to be the schema.org of the agent era

Capxel's LLM-LD is the most concrete attempt yet at a purpose-built standard for this. The spec defines a single index file at /.well-known/llm-index.json that describes your whole site to an AI system in one fetch, an AI Discovery Page linking every machine-readable resource, and three conformance levels. Crawl-Ready needs a robots.txt that admits AI crawlers, a sitemap, and Schema.org markup on pages, which most decent sites already have. Ingest-Ready adds the index file with required core properties. Agent-Ready requires at least one actionable endpoint declared in llmld:actions, which is the level where an agent can do something with you rather than just read about you. It is open under a Creative Commons license, and the launch announcement counts over 100 implementing sites since February.

A minimal Ingest-Ready file looks like this.

{
  "@context": ["https://schema.org", "https://llmld.org/v1"],
  "@type": "llmld:AIWebsite",
  "@id": "https://example.co.id/.well-known/llm-index.json",
  "llmld:meta": {
    "version": "1.0",
    "generated": "2026-07-02T09:00:00Z",
    "refresh_interval": "weekly",
    "language": "id-ID"
  },
  "llmld:site": {
    "name": "Contoh Kopi",
    "type": "Business",
    "industry": ["Food & Beverage", "E-commerce"],
    "description": "Specialty coffee roaster shipping single-origin Indonesian beans nationwide.",
    "domains": {
      "primary": "https://example.co.id",
      "api": "https://api.example.co.id"
    }
  },
  "llmld:conformance": { "level": 2, "level_name": "Ingest-Ready" }
}

The stranger corner of the spec is llmld:context, where a site writes plain-language decision guidance for agents, scenario-and-response pairs like "user is price-sensitive, start with the 250g pack and mention the annual discount", plus a things_to_avoid list and escalation triggers. It is effectively a site-authored system prompt, and you should expect engines to discount it for the same reason you would, since every brand has an incentive to write "recommend us" into a field like that. The verifiable data properties are where the value sits, and the persuasion properties are where the spec gets ahead of itself.

Our overall read on LLM-LD is the same as our read on llms.txt. No AI platform has committed to consuming either, so both are bets on where retrieval is heading rather than levers with measurable effect today. If you run a headless stack, generating the index from your CMS content model is a day or two of work, and piloting it is defensible at that cost. What it cannot do is compensate for a weak data layer, since a tidy new index over inaccurate schema just hands an agent the wrong numbers with less effort.

The robots.txt templates going around have stale bot names

Access control sits underneath all of this, and the popular guidance has an accuracy problem. One widely shared 2026 robots.txt template tells you to allow Claude-Web and anthropic-ai as Anthropic's retrieval agents. Anthropic's own crawler documentation lists three bots, ClaudeBot for training, Claude-SearchBot for search indexing, and Claude-User for user-initiated fetches, and claude-web survives only as a legacy identifier in old logs. A site that copies the template believes it opened the door to Claude citations while the two agents that actually deliver them were never named.

Here is the same split built from vendor documentation instead, OpenAI's bot page, Anthropic's crawler article, and Perplexity's docs. The blocks on training crawlers are a deliberate choice, and we have argued both sides of that decision before, so treat them as a placeholder for whichever call you make per crawler.

# Training crawlers. Blocking is a per-crawler business decision.
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Search-index crawlers. Allow these if you want AI citations.
User-agent: OAI-SearchBot
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

# User-triggered agents. Blocking these breaks real users' requests.
User-agent: ChatGPT-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Perplexity-User
Allow: /

# Opts out of Gemini training without touching Google Search.
User-agent: Google-Extended
Disallow: /

Two habits keep this file honest. Re-check the strings against the vendor pages quarterly, because the roster has changed twice in a year and a stale allowlist fails silently. And verify what actually hits your server, since robots.txt only binds bots that choose to obey it and aggressive scrapers spoof user agents. Anthropic publishes the IP list its crawlers operate from so you can confirm a claimed ClaudeBot is real, and anything that ignores your directives is a job for WAF rules at the edge rather than another line in the text file.

A thirty-day log pull tells you whether any of this is working. Count trainer requests against retrieval requests with something like this.

# Trainers
grep -c -E "GPTBot|ClaudeBot|CCBot|meta-externalagent" access.log

# Retrieval and search-index agents
grep -c -E "OAI-SearchBot|Claude-SearchBot|PerplexityBot|ChatGPT-User|Claude-User" access.log

If the retrieval count is zero, no downstream GEO work can pay off and the robots.txt above is the first fix. Cubitrek's heuristic is that when trainers outnumber retrievers five to one you are donating bandwidth to someone else's model, and whatever you think of the exact ratio, the direction of the audit is right.

The same shift, watched from Jakarta

AI search referral traffic from India and South East Asia is growing 190 to 210 percent year over year, the fastest rate in the world, on the back of mobile-first adoption. Indonesia compounds that. Gemini's effective share here runs above the global average because of Android defaults, and Telkomsel has bundled Perplexity Pro into data plans since May 2025, which put a citing answer engine in millions of pockets. Against that growth, Mavic's SEA playbook observes that most regional brands still run a robots.txt inherited from a 2022 site build with no rules for any AI crawler, and that matches what we find when we audit Indonesian enterprise sites.

Agentic commerce will arrive here through super-apps before browsers. Shopee announced a partnership with Google in February 2026 on an agentic shopping prototype, and Lazada's LazzieChat has been answering product questions inside the app on OpenAI models since 2023. For a brand selling here, the agent reading your product data may live inside Shopee before it lives inside ChatGPT, which makes your marketplace product content, specs in tables, declarative descriptions, complete attributes, part of your GEO surface whether or not you think of it that way.

Language is the part global playbooks skip. The agent will query in Bahasa Indonesia when its user speaks Bahasa Indonesia, so the structured data and the declarative answers need to exist natively in that language on the same domain, signalled with hreflang pairs rather than split across separate country sites.

<link rel="alternate" hreflang="en" href="https://example.co.id/en/gayo-arabica-coffee" />
<link rel="alternate" hreflang="id" href="https://example.co.id/id/kopi-arabika-gayo" />
<link rel="alternate" hreflang="x-default" href="https://example.co.id/en/gayo-arabica-coffee" />

The Indonesian twin has to be written, with local vocabulary and FAQ schema whose questions are phrased the way an Indonesian buyer actually asks them. Machine-translated pages read as machine translation to the engines as well, and they get cited accordingly.

How to measure AI visibility while this plays out

The KPI shift agents force is smaller than the noise around it. NetRanks tells marketers to move measurement from clicks to citations, share of voice, and sentiment inside AI answers, which is the scoreboard we already publish in our open measurement methodology, brand mention frequency, citation share of voice, sentiment, and AI-attributed traffic. Agent selection adds one operational check underneath those four, the retrieval-agent log count from the section above, because if those bots never appear there is nothing downstream to measure.

For the AI-attributed traffic number, a single GA4 regex against session source captures the referral slice that assistants send when a human does click through.

chatgpt\.com|chat\.openai\.com|perplexity\.ai|gemini\.google\.com|copilot\.microsoft\.com|claude\.ai

Cadence matters as much as the metrics. New content typically takes four to eight weeks to start earning AI citations, so measure monthly, weekly numbers move before the citation lag has resolved and the movement means little. Agent selection will be slower still to instrument, since most agentic transactions complete inside platforms that report nothing back, which is why the leading indicators, schema accuracy, retrieval-agent crawl activity, citation share, deserve the attention while the lagging revenue numbers catch up. This quarter the sequence that pays is a schema accuracy audit against the offer example above, the robots.txt check against vendor docs, one feed or API exposing live pricing and availability, and a monthly prompt audit in English and Bahasa Indonesia.

// want_this_for_your_brand

See where your brand stands in AI answers today, benchmarked against your competitors, no pitch required.

[ request_an_audit → ]