Hi 👋🏻
In the rapidly evolving landscape of AI-powered search—from Google's AI Overviews to ChatGPT Search, Perplexity, and Claude—one question keeps surfacing in SEO circles: Does structured data actually matter for AI search optimization (AISEO/GEO)?
The answer is more nuanced than most expect, and recent experiments from the SEO community reveal a paradox. Let me walk you through the patents, the tests, and what it all means for your content strategy in 2026.
The Patent Evidence: Google's Entity-First Architecture
To understand why this debate matters, we need to start with how Google's search infrastructure actually works. Several key patents reveal the foundation:
US10235423B2: Ranking Search Results Based on Entity Metrics
This patent describes how Google uses knowledge graphs to rank content based on entity-specific metrics including:
- Relatedness metrics (co-occurrence of entities and entity types)
- Notable entity type metrics (categorization characteristics)
- Contribution metrics (connections between entities)
- Fame metrics (aggregated contributions)
The patent explicitly states that information from external structured data sources (like Wikidata) and structured data within websites "could be used to determine search engine results page placement."
US11769017B1: Generative Summaries for Search Results
Filed by Google in 2023, this patent covers how large language models generate summaries using retrieval-augmented generation (RAG). The system augments LLM inputs with "additional information based on search results" and uses "content that reflects familiarity of the user with certain content."
Crucially, this patent describes LLMs as database structures where structured information facilitates "more focused retrieval of information from the database."
US20190294732A1: Constructing Enterprise-Specific Knowledge Graphs
This patent details how Google constructs knowledge graphs from both structured and unstructured data, identifying "relationships between entities that match known relationships" and using "entity canonicalization" to map entities to predefined taxonomies.
The Pattern?
These patents consistently emphasize entity recognition, relationship mapping, and structured data as foundational to Google's semantic understanding—which powers everything from traditional search to AI-driven features.
Side note: This is fully aligned with the approach we teach in our latest course by Beatrice Gamba, AI Search & LLMs: Entity SEO and Knowledge Graph Strategies for Brands 💜 and anyone teaching anything different is not fully in tune with how these systems work. More on the course later, let's continue.
Community Experiments: A Tale of Two Findings
Here's where things get interesting. Recent experiments from respected SEO practitioners have produced seemingly contradictory results.
The Tokenization Discovery
In September 2024, Mark Williams-Cook ran a controlled experiment that sent shockwaves through the SEO community. This was later furthered by experiments by Dan Petrovic. Their findings:
The Problem: When LLMs process web pages during training, they tokenize content—breaking text into discrete tokens. His experiment demonstrated that schema markup like "@type": "Organization" gets "destroyed" during this process. The tokens for "type" and "Organization" become separated, stripped of their structured context.
The Conclusion: During LLM training, schema markup is reduced to "meaningless tokens" that the AI doesn't treat as structured signals at all.
Julio C. Guevara replicated this with a complementary test using two product pages—one with visible text plus schema, one with schema only. Result: LLMs couldn't extract information (price, colors, SKU numbers) from the schema-only page. They needed visible text.
The Google AI Overviews Test
But in September 2024, Molly Nogami and a colleague conducted a different experiment specifically targeting Google's AI Overviews:
The Setup: Three single-page sites were created:
- Well-implemented schema
- Poorly implemented schema
- No schema
The Results:
- The well-implemented schema page ranked #3 and appeared in AI Overviews for multiple queries
- The poorly implemented schema page ranked but peaked at Position 8, with no AI Overview appearances
- The no schema page was crawled but never indexed at all
The page with proper schema ultimately ranked for 26 relevant keywords within Google's AI Overviews and received citations.
The AISO Case Study
Another controlled test compared two identical websites—one with schema markup, one without. When queried by ChatGPT:
- The site with schema markup provided "more detailed and authentic information"
- Schema allowed better control over brand story and how information appeared in LLM responses
- The structured site received more accurate citations
Real-World Performance Data
Multiple case studies demonstrate measurable impact:
Xponent21 Agency: Implementing comprehensive schema markup as part of an AI SEO strategy contributed to 4,162% traffic growth and achieving top positions in both Google AI Overviews and Perplexity for target queries
The Search Initiative: After implementing Article and FAQ schema, a client's page ranked for 26 relevant keywords within Google's AI Overviews
case study and experiments Analysis: why is there a Contradiction
So how do we make sense of these seemingly contradictory findings? The key lies in understanding the difference between LLM training and search system architecture.
The shared experiments against schema for AISEO are technically correct: Pure LLMs processing raw web content during training don't interpret schema as structured data. Tokenization breaks the structured markup into fragments.
But AI search systems don't work that way in production. Here's the crucial distinction:
- Google's AI Overviews use a hybrid system. They don't just rely on LLM training data—they perform real-time retrieval from Google's index, which does understand structured data.
-
Indexing happens before generation. Before any AI generates an answer, Google's crawlers must first index your content. Structured data significantly affects:
- Whether you get indexed at all
- How your content is categorized and understood
- Which entity relationships are recognized
- How your page is stored in the Knowledge Graph
- RAG systems are different from pure LLMs. Retrieval-Augmented Generation (used by Google AI Overviews, ChatGPT Search, and Perplexity) doesn't just rely on training data—it retrieves and processes content in real-time, where structured data can provide crucial context.
The Entity Connection: Why Structured Data Still Wins
The real power of structured data for AI search lies in entity resolution and relationship mapping—concepts deeply embedded in Google's patents and the semantic web.
How Entity Recognition Works
When you implement schema markup, you're not just helping LLMs—you're helping search engines:
- Resolve entities: Clearly identify that "Apple" on your page means Apple Inc. (the company), not the fruit
- Map relationships: Establish connections between your brand, products, people, and concepts
- Populate knowledge graphs: Feed Google's 800 billion facts about 8 billion entities
- Enable semantic search: Allow AI systems to understand context, not just keywords
The Knowledge Graph Effect
Google's Knowledge Graph expanded from 570 million entities to 8 billion entities with 800 billion facts in just over a decade. This semantic understanding is the "life-blood of modern search."
When you use structured data:
- You claim your entity in Google's semantic network
- You define the relationships that matter
- You make your content "citation-ready" for AI systems
- You increase semantic clarity for both humans and machines
Entity clarity can be the trigger that determines whether you are selected and recognised as a source of a clear answer in both AI search systems and traditional search.
The Verdict: Yes, You Should Absolutely Use Structured Data
Here's my conclusion based on the evidence:
For Traditional Search Systems (Including AI Overviews), Structured data is essential:
- Proper schema can mean the difference between indexing and invisibility
- Well-implemented structured data correlates with AI Overview appearances
- Schema markup directly impacts traditional SEO, which feeds into AI systems
For Pure LLM Training Data, Structured data has limited direct impact as tokenization breaks schema markup during training.
But Here's the Critical Point
You're not optimizing for LLM training—you're optimizing for hybrid AI search systems that:
- Crawl and index using traditional search infrastructure (which reads schema)
- Populate knowledge graphs (which depend on entity relationships)
- Perform real-time retrieval and augmentation (where structured context matters)
- Synthesize answers from indexed, structured content
Modern AI search isn't just ChatGPT or Claude reading web pages. It's:
- Google AI Overviews and AI Mode: Powered by real-time retrieval from Google's structured index
- Perplexity: Uses web search APIs that understand structured data
- ChatGPT Search: Retrieves from search systems, not just training data
- Bing Copilot: Integrates Bing's semantic search infrastructure
All of these systems benefit from structured data at the retrieval and indexing stage, even if their LLM components don't directly parse schema during generation.
Implementation Recommendations
Based on the research and experiments, here's what you should prioritize:
1. Core Schema Types for AI Search
- Article/BlogPosting: For editorial content (boosts AI Overview visibility)
- FAQPage: Critical for Q&A content and voice search
- HowTo: For instructional content (step-by-step formats)
- Organization: Establishes your brand entity
- Product: With nested Offer and Review schemas
- BreadcrumbList: Improves structural understanding
2. Entity Relationship Mapping
- Use sameAs properties to link your entities to authoritative sources (Wikipedia, Wikidata, LinkedIn)
- Implement Organization schema with clear identifiers
- Create internal linking patterns that reinforce entity relationships
- Build a "mini Knowledge Graph" through your site architecture
3. Don't Neglect Visible Text
Visible, well-structured text is non-negotiable. Your schema should complement (not replace) clear, accessible content.
Best practices:
- Write in short, factual blocks (2-4 sentences per paragraph)
- Use clear headings that establish entity context
- Include explicit answers to questions
- Make key information visible in HTML, not just in schema
4. Focus on JSON-LD
Google recommends JSON-LD format because it's:
- Easier to implement and maintain
- Separates markup from content
- More scalable for large sites
- Less prone to errors
5. Test and Monitor
- Use Google's Rich Results Test for validation
- Monitor AI Overviews appearance with tools like BrightEdge or Conductor
- Track entity visibility in Knowledge Panels
- Measure citation rates across AI platforms
The Bigger Picture: Entity-First SEO is the Future
The structured data debate reveals a broader shift in search optimization.
AI search is accelerating the move from keyword-matching to semantic understanding. With ChatGPT handling 2.5 billion prompts daily and AI Overviews appearing for nearly 19% of US searches, visibility now depends on:
- Semantic clarity: Being recognized as an authoritative entity
- Relationship mapping: Establishing your place in the semantic network
- Structured communication: Making your meaning machine-readable
- Knowledge graph presence: Existing in Google's entity database
Structured data is how you achieve all four.
Structured data doesn't primarily help AI by being read during model training. It helps AI by:
- Ensuring your content gets indexed in the first place
- Populating knowledge graphs that AI systems query
- Establishing entity relationships that enable semantic search
- Providing structured context for real-time retrieval systems
- Enhancing traditional search signals that AI Overviews depend on
The question isn't whether to use structured data for AI search optimization—it's how quickly you can implement it comprehensively.
That said...
LEarn how to quickly implement entity relationship networks, knowledge graphs and structured data for AISEO 🌟
Our latest course on the MLforSEO Academy is designed as a practical, systems-focused journey from understanding entities to building and optimizing knowledge graphs and finally engineering brand authority for AI search and LLMs.
You’ll move through a set of core modules, each broken down into short, focused lessons with concrete examples, mini case studies, and tool-driven exercises (e.g. interrogating Google’s Knowledge Graph, doing entity audits, and working with the Knowledge Graph API).
Two optional knowledge primers sit alongside the main pathway – one on entities/NER and one on knowledge graphs – so you can quickly catch up or deepen specific technical foundations without slowing down the strategic narrative of the core course.
100+ forward-thinking marketers are already taking our courses 💜
Community discussion 🌟
As search continues its evolution from keywords to entities, from text to meaning, structured data becomes your bridge to being understood, remembered, and cited by the machines that increasingly determine what people see.
The verdict is clear: Yes, you should absolutely use structured data and entity relationship mapping to enhance visibility in both traditional and AI search systems.
Not because LLMs magically read your schema during training, but because the actual infrastructure of AI search—the indexing, the knowledge graphs, the retrieval systems, the semantic networks—all depend on structured, machine-readable information.
- What's your experience with structured data and AI search?
- Have you seen measurable improvements in AI Overview visibility or LLM citations after implementing schema markup?
Reply and share your findings—the community learns best when we share real results.
Join 670+ AI/ML-interested marketers on our Slack community to stay up to date with discussions on AI/ML automation in SEO and marketing.
Happy learning! ✨
Lazarina