Most agency conversations about AI search stop at ChatGPT and Perplexity. That's a gap. Claude — Anthropic's model — now has web search capability in Claude.ai and through the API, and ClaudeBot is crawling the web to back it up. Agencies that help clients show up in Claude answers have a differentiation window that won't stay open long.
Here's what you actually need to know — not theory, but the mechanics of how Claude retrieves and cites content and what you can do about it today.
Claude uses two retrieval modes depending on context. When a user enables web search in Claude.ai (or when a developer enables it via the API), Claude issues live queries to a search index and synthesizes the results. When web search is off, Claude answers from training data alone.
The crawling side: Anthropic's ClaudeBot fetches pages to build training data and potentially to power retrieval-augmented responses. ClaudeBot's user agent is Claude-SearchBot or ClaudeBot depending on the version — you'll see both in server logs.
The key architectural difference from other AI search products: Claude is more likely to synthesize across sources and attribute claims to specific pages than to surface a direct excerpt. This means factual, extractable claims written in clear prose get cited more readily than marketing language that can't be lifted cleanly.
These three products have meaningfully different citation behaviors. Understanding the differences tells you what to optimize for each.
Perplexity fetches pages in real time for nearly every query, shows citations inline, and rewards pages with clearly sectioned factual content. It pulls from a relatively shallow crawl of the top results.
ChatGPT Search (via Bing) inherits Bing's index and ranking signals. Structured content, E-E-A-T, and schema markup that helps Bing also helps ChatGPT Search. Microsoft's content policies around Bingbot apply.
Claude is more conservative about citations than Perplexity. It synthesizes more and quotes less. This means shallow bullet-point content that Perplexity might grab and display often gets ignored by Claude in favor of dense, authoritative prose. Claude also tends to weight longer-form explanatory content over thin FAQ pages.
The opinionated observation here: Claude is closer to a researcher than a search engine. It's looking for the most complete and credible source on a claim, not the most SEO-optimized snippet. Writing for Claude means writing like a well-cited practitioner article, not a featured-snippet farm.
Four concrete signals determine whether a page gets used as a Claude source.
Clean information architecture. Claude extracts claims at the sentence and paragraph level. Content that buries a stat in a 400-word paragraph is harder to extract than a claim that's clearly stated in its own sentence. Write: "The average CPC for legal keywords in 2025 was $54.86 (WordStream)." Not: "Legal keywords tend to be among the most expensive in the industry, with some industry observers noting that costs can be quite high."
Named, attributed claims. Claude weights claims that name sources, authors, or institutions. "According to Google's 2024 Search Quality Evaluator Guidelines" or "a 2025 BrightLocal survey found that 87% of consumers read local reviews" is more citable than "research shows."
Factual density. Pages with a high ratio of specific, verifiable facts to filler prose are more likely to be referenced. This is why industry-specific data pages — pricing benchmarks, failure rates, time estimates — get cited frequently.
Correct schema markup. For medical, legal, and financial content, MedicalWebPage, LegalService, or FinancialProduct schema signals to both crawlers and retrieval systems that the content type is authoritative in its domain. Use the AI Visibility Grader to check whether a client's pages carry the markup that positions them for AI retrieval.
ClaudeBot respects robots.txt. Blocking it is a choice that explicitly removes a site from Claude's retrieval pool.
To allow ClaudeBot:
User-agent: ClaudeBot
Allow: /
To block it:
User-agent: ClaudeBot
Disallow: /
Most agency clients have a generic User-agent: * rule that allows all bots. That's fine for ClaudeBot access. The failure mode is sites that block * for security reasons and forget to explicitly allow legitimate AI crawlers. Audit the robots.txt and flag any Disallow: / rules applied to * with no exemptions.
llms.txt is a proposed standard (not yet an official protocol, but already crawled by several AI systems) that tells LLMs which content is intended for AI consumption and what the site's policies are.
A minimal llms.txt at the root looks like:
# llms.txt
# This file provides guidance for AI language models
User-agent: *
Allow: /blog/
Allow: /docs/
Allow: /about/
Disallow: /private/
Contact: webmaster@example.com
License: https://example.com/terms
The content value is modest today. The signal value — that the client is thinking seriously about AI discoverability — is real. And as the standard matures, early adopters will have indexed content that late arrivals won't.
Check whether the client has a llms.txt. Most don't. Adding one takes ten minutes and costs nothing.
Based on observed citation patterns, these formats get lifted by Claude more often than others:
Comparison tables with factual columns (price, feature, limitation) — Claude uses these for "what's the difference between X and Y" queries.
Step-by-step processes with numbered lists where each step is a complete, actionable sentence — favored for "how do I" queries.
Definition sections that open with the term in bold followed by a precise definition — favored for "what is X" queries.
Data tables with sourced figures — favored for "what are the benchmarks for X" queries.
What Claude does not favor: wall-of-text introductions that delay the answer, keyword-stuffed paragraphs that repeat the same concept seven ways, and content that hedges every claim into meaninglessness.
The GEO pitch for a client unfamiliar with AI search:
"A growing number of your prospects are asking AI tools like Claude, ChatGPT, and Perplexity questions that used to go to Google. When those tools answer, they cite sources. Your goal is to be the source they cite. That requires content written differently than traditional SEO content — more specific, more authoritative, more data-backed. We can audit your current content against those standards and identify exactly where the gaps are."
The agency's measurable outcome: track brand mentions in AI responses using tools that monitor AI citations (Profound, Otterly, AI Monitor). Before/after citation rate is the KPI.
Over-indexing on schema markup alone. Schema helps, but Claude's retrieval isn't purely structured-data-driven. Dense factual prose matters more.
Chasing AI overviews as a proxy. Google AI Overviews and Claude answers are different products with different retrieval logic. A page that earns a Google AI Overview isn't automatically a Claude citation.
Blocking ClaudeBot while allowing others. This happens when robots.txt was written by someone who identified "ClaudeBot" as a scraper rather than a legitimate crawler. Check every client's robots.txt.
Writing for the LLM instead of the human. Content optimized for AI retrieval should still be high-quality for human readers. Thin pages stuffed with factual claims but no narrative fail both audiences.
The audit one-liner for Claude AI search: Allow ClaudeBot in robots.txt, add llms.txt, rewrite the top five client pages to lead with attributed, extractable factual claims — that's 80% of the structural optimization in a three-hour sprint.