AI bots from OpenAI, Anthropic, and other LLM providers now crawl websites more frequently than Google’s own search crawler, according to a 2026 analysis of 66.7 billion web crawler requests tracked by Vercel and reported by Position Digital. This shift signals a fundamental change in how online visibility works: optimizing for traditional search engines is no longer enough when the majority of automated traffic hitting your site comes from AI systems building knowledge bases, not search indexes.

The Numbers Behind the Crawling Shift

The scale of AI crawling has grown dramatically in just 18 months. Here are the key data points from the Position Digital analysis:

  • 66.7 billion crawler requests were analyzed across major web infrastructure providers in early 2026
  • LLM training bots (GPTBot, ClaudeBot, Bytespider) now make more total requests than Googlebot on many mid-sized websites
  • AI search bots (ChatGPT-User, PerplexityBot) are expanding their reach even as more sites block training crawlers

This creates an interesting paradox: websites are increasingly blocking AI training crawlers via robots.txt, but AI search crawlers are ramping up. The distinction matters enormously for your visibility strategy.

Training Bots vs. Search Bots: A Critical Distinction

Bot TypeExamplesPurposeShould You Block?
AI Training BotsGPTBot, ClaudeBot, Bytespider, CCBotScrape content for model trainingYour choice (no direct visibility benefit)
AI Search BotsChatGPT-User, PerplexityBotFetch content to answer real-time queriesNo (blocking kills your AI visibility)
Traditional SearchGooglebot, BingbotIndex pages for search resultsNo
AI Overview BotsGoogle-ExtendedFeed content into AI OverviewsDepends on your strategy

The mistake many site owners make is blocking all AI bots with a blanket robots.txt rule. This is the equivalent of blocking Googlebot in 2010 because you were worried about content scraping. You’d be invisible to the fastest-growing discovery channel on the internet.

How Each Major AI Engine Crawls the Web

Understanding how each platform’s crawler works gives you a strategic advantage. Not all AI bots behave the same way, and each has different implications for your AI visibility score.

ChatGPT (OpenAI)

ChatGPT operates two distinct crawlers:

  1. GPTBot (user agent: GPTBot/1.0) - Training crawler that scrapes content for model training. Respects robots.txt. Many sites block this.
  2. ChatGPT-User (user agent: ChatGPT-User) - Real-time search crawler activated when users ask ChatGPT questions with Browse enabled. This is the one that matters for visibility.

ChatGPT still drives approximately 80% of all AI referral traffic according to Stacked Marketer’s March 2026 analysis. When someone asks “what’s the best project management tool for small teams?” and ChatGPT browses the web, ChatGPT-User is what hits your site.

Key insight: Blocking GPTBot does NOT block ChatGPT-User. They operate independently. You can prevent your content from being used for training while still appearing in ChatGPT’s real-time search results.

Google Gemini and AI Overviews

Google’s approach is more integrated with its existing infrastructure:

  • Googlebot handles primary crawling and indexing as always
  • Google-Extended is the specific user agent for Gemini training data
  • AI Overviews pull from Google’s existing search index, meaning standard Googlebot access is what determines your AI Overview eligibility

Gemini’s share of AI referral traffic is growing rapidly. The gap with ChatGPT was much wider six months ago; now it’s roughly 8x, down from over 15x. For businesses targeting Google’s ecosystem, this is significant because AI Overviews optimization directly impacts your visibility in the world’s largest search engine.

Perplexity

Perplexity’s crawler (PerplexityBot) is perhaps the most citation-friendly of all AI search engines. When Perplexity answers a question, it explicitly links to sources with numbered citations visible to users.

  • PerplexityBot crawls pages in real-time when users ask questions
  • It heavily favors well-structured, factual content with clear data points
  • Pages with FAQ sections and comparison tables get cited at higher rates
  • Perplexity recently added Claude Sonnet 4.6 and Gemini 3.1 Pro as agent models, expanding its capabilities

Claude (Anthropic)

Anthropic’s ClaudeBot crawls for training data, but Claude’s web search feature (available in Claude Pro) uses a separate browsing mechanism. As Claude’s market share grows, particularly in enterprise contexts, ensuring your content is accessible to Anthropic’s crawlers becomes increasingly important.

The Full Crawler Landscape

AI EngineTraining BotSearch BotRespects robots.txtCitation Style
ChatGPTGPTBotChatGPT-UserYesInline mentions, sometimes linked
GeminiGoogle-ExtendedGooglebotYesAI Overview cards with source links
PerplexityPerplexityBotPerplexityBotYesNumbered citations with links
ClaudeClaudeBotBrowse featureYesInline references
GrokNone publicNone publicN/APulls from X/Twitter data primarily

What This Means for Your Visibility Strategy

The crawling shift has three major implications for how businesses should think about online presence.

1. Your robots.txt Is Now a Visibility Decision

Before 2024, robots.txt was mostly a technical SEO consideration. Now it’s a strategic business decision. Every line you add to robots.txt that blocks an AI crawler is a channel you’re closing off.

Here’s a recommended robots.txt configuration that balances privacy with visibility:

# Allow AI search crawlers (visibility)
User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

# Optional: block training-only crawlers
User-agent: GPTBot
Disallow: /private/
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

# Standard search engines
User-agent: Googlebot
Allow: /

2. llms.txt Is No Longer Optional

If AI bots are your most frequent visitors, you need to speak their language. The llms.txt protocol gives AI crawlers a structured summary of your site, your brand, and your expertise. Think of it as a cover letter for AI engines.

Sites with a well-structured llms.txt file see measurably better AI citation rates because the file helps LLMs understand:

  • What your brand does
  • What topics you’re authoritative on
  • How to categorize your content
  • Which pages matter most

If you haven’t set up llms.txt yet, that should be today’s priority. The setup takes less than 30 minutes and the visibility impact compounds over time.

3. Content Structure Matters More Than Ever

AI crawlers don’t just read your content. They parse it. They extract structured information and use it to build knowledge representations. This means the format of your content directly impacts whether it gets cited.

A March 2026 study by Wix analyzing AI citation patterns found that:

  • Listicles account for 21.9% of all AI citations
  • Standard articles account for 16.7%
  • Product pages account for 13.7%

The common thread? Structured, scannable content with clear data points wins. This aligns with what we found in our analysis of which content types get cited by AI engines.

The Zero-Click Reality

There’s a uncomfortable truth buried in the crawling data: more AI visits to your site doesn’t necessarily mean more human traffic from AI.

The zero-click phenomenon is accelerating. Users ask AI engines questions, get answers synthesized from your content, and never visit your site. JumpFly’s March 2026 analysis found that AI summaries are increasingly satisfying user queries without requiring a click-through.

This changes the value equation:

MetricOld Model (SEO)New Model (GEO)
Success =Clicks to siteBrand mentions in AI answers
Value of visit =Page view + potential conversionAI indexing your expertise
Content goal =Rank on page 1Be cited as the answer
Traffic source =Google SERPAI-generated recommendations

Your iScore (AI visibility score) becomes a better success metric than organic traffic for many businesses. A restaurant that ChatGPT recommends to thousands of users asking “best Italian restaurant in [city]” gets enormous value even if those users never visit the restaurant’s website directly.

Practical Steps: Optimizing for AI Crawlers

Here’s what to do this week to capitalize on the AI crawling shift:

Step 1: Audit Your Current AI Crawler Access

Check your server logs or analytics for these user agents:

  • GPTBot
  • ChatGPT-User
  • PerplexityBot
  • ClaudeBot
  • Google-Extended
  • Bytespider

If you see them being blocked (403 responses), fix your robots.txt immediately.

Step 2: Implement llms.txt

Create a /llms.txt file at your domain root with:

  • Brand name and one-line description
  • Core topics and expertise areas
  • Key pages and their purposes
  • Contact and authorship information

Full setup guide: How to Set Up llms.txt for Your Website

Step 3: Structure Content for AI Extraction

For every important page:

  1. Answer-first opening - Put your main point in the first sentence
  2. Use comparison tables - AI engines love structured data they can reference
  3. Add FAQ sections - Question-answer pairs are the highest-cited content format
  4. Include specific data points - Numbers with sources get cited more than vague claims
  5. Use clear headings - H2/H3 structure helps AI parse topic segments

Step 4: Monitor Your AI Visibility

Track how often AI engines cite your brand. Tools like those in our AI visibility monitoring comparison can automate this process. Your iScore gives you a single number that tracks your visibility across ChatGPT, Gemini, Perplexity, Claude, and Grok simultaneously.

Step 5: Publish Consistently

AI engines favor fresh, regularly updated content. A site that publishes weekly gets crawled more frequently than one that publishes monthly. The compounding effect is real: more content means more crawl visits, which means more opportunities for citation.

What’s Coming Next

The AI crawling landscape will continue shifting throughout 2026:

  • Agentic AI crawlers are emerging. These don’t just read pages; they interact with them, fill out forms, and complete transactions. Google, Perplexity, and OpenAI are all building agent-based browsing.
  • Crawl frequency will increase as AI search usage grows. Expect AI crawlers to make up over 50% of total bot traffic on most websites by Q4 2026.
  • New AI engines will launch with their own crawlers. The competitive landscape is expanding, and each new player means another bot visiting your site.
  • robots.txt standards for AI may evolve. The current system of individual user-agent rules doesn’t scale well. Industry groups are discussing standardized AI crawler categories.

The businesses that treat AI crawlers as their most important visitors today will own the AI visibility landscape tomorrow. The data is clear: AI bots are already your most frequent automated visitors. The question is whether you’re ready for them.

Check your AI visibility score free at searchless.ai/audit

Frequently Asked Questions

How can I tell if AI bots are crawling my website?

Check your server access logs for user agents containing “GPTBot,” “ChatGPT-User,” “PerplexityBot,” “ClaudeBot,” or “Google-Extended.” Most analytics platforms like Cloudflare, Vercel, and AWS CloudFront also provide bot traffic breakdowns in their dashboards. If you’re using a CDN, check the bot management section for AI crawler statistics.

Will blocking AI training bots hurt my visibility in ChatGPT or Perplexity?

Blocking training bots (like GPTBot) does not directly block your visibility in real-time AI search results. ChatGPT’s browsing feature uses a separate crawler (ChatGPT-User) that operates independently from GPTBot. However, blocking training crawlers means your content won’t be embedded in the model’s base knowledge, so it can only find you through real-time browsing, not from memory. For maximum AI visibility, allow both training and search crawlers.

How often do AI search bots crawl a typical website?

Crawl frequency varies based on your site’s authority, update frequency, and content volume. High-authority sites with daily updates may see AI crawler visits multiple times per day. Smaller sites with infrequent updates might see weekly visits. Publishing fresh content regularly is the most reliable way to increase your crawl frequency across all AI engines.

Is llms.txt the same as robots.txt?

No. robots.txt tells crawlers what they can’t access. llms.txt tells AI crawlers what your site is about and which content matters most. They serve complementary purposes. robots.txt is access control; llms.txt is a structured introduction to your brand and content for AI systems. Both should be present at your domain root for optimal AI visibility.

Should small businesses worry about AI crawlers?

Yes. AI search is growing fastest in local and service-based queries, which is exactly where small businesses compete. When someone asks ChatGPT “best plumber near me” or “recommend a good Italian restaurant in [city],” the AI draws from crawled web content to generate recommendations. If AI bots can’t access your site or can’t understand your content, you won’t appear in those recommendations regardless of how good your Google ranking is.