The Structured Data Debate

Should You Use Schema for AI Search Optimization?

Hi 👋🏻

In the rapidly evolving landscape of AI-powered search—from Google's AI Overviews to ChatGPT Search, Perplexity, and Claude—one question keeps surfacing in SEO circles: Does structured data actually matter for AI search optimization (AISEO/GEO)?

The answer is more nuanced than most expect, and recent experiments from the SEO community reveal a paradox. Let me walk you through the patents, the tests, and what it all means for your content strategy in 2026.

The Patent Evidence: Google's Entity-First Architecture

To understand why this debate matters, we need to start with how Google's search infrastructure actually works. Several key patents reveal the foundation:

US10235423B2: Ranking Search Results Based on Entity Metrics

This patent describes how Google uses knowledge graphs to rank content based on entity-specific metrics including:

Relatedness metrics (co-occurrence of entities and entity types)
Notable entity type metrics (categorization characteristics)
Contribution metrics (connections between entities)
Fame metrics (aggregated contributions)

The patent explicitly states that information from external structured data sources (like Wikidata) and structured data within websites "could be used to determine search engine results page placement."

US11769017B1: Generative Summaries for Search Results

Filed by Google in 2023, this patent covers how large language models generate summaries using retrieval-augmented generation (RAG). The system augments LLM inputs with "additional information based on search results" and uses "content that reflects familiarity of the user with certain content."

Crucially, this patent describes LLMs as database structures where structured information facilitates "more focused retrieval of information from the database."

US20190294732A1: Constructing Enterprise-Specific Knowledge Graphs

This patent details how Google constructs knowledge graphs from both structured and unstructured data, identifying "relationships between entities that match known relationships" and using "entity canonicalization" to map entities to predefined taxonomies.

The Pattern?

These patents consistently emphasize entity recognition, relationship mapping, and structured data as foundational to Google's semantic understanding—which powers everything from traditional search to AI-driven features.

Side note: This is fully aligned with the approach we teach in our latest course by Beatrice Gamba, AI Search & LLMs: Entity SEO and Knowledge Graph Strategies for Brands 💜 and anyone teaching anything different is not fully in tune with how these systems work. More on the course later, let's continue.

Community Experiments: A Tale of Two Findings

Here's where things get interesting. Recent experiments from respected SEO practitioners have produced seemingly contradictory results.

The Tokenization Discovery

In September 2024, Mark Williams-Cook ran a controlled experiment that sent shockwaves through the SEO community. This was later furthered by experiments by Dan Petrovic. Their findings:

The Problem: When LLMs process web pages during training, they tokenize content—breaking text into discrete tokens. His experiment demonstrated that schema markup like "@type": "Organization" gets "destroyed" during this process. The tokens for "type" and "Organization" become separated, stripped of their structured context.

The Conclusion: During LLM training, schema markup is reduced to "meaningless tokens" that the AI doesn't treat as structured signals at all.

Julio C. Guevara replicated this with a complementary test using two product pages—one with visible text plus schema, one with schema only. Result: LLMs couldn't extract information (price, colors, SKU numbers) from the schema-only page. They needed visible text.

The Google AI Overviews Test

But in September 2024, Molly Nogami and a colleague conducted a different experiment specifically targeting Google's AI Overviews:

The Setup: Three single-page sites were created:

Well-implemented schema
Poorly implemented schema
No schema

The Results:

The well-implemented schema page ranked #3 and appeared in AI Overviews for multiple queries
The poorly implemented schema page ranked but peaked at Position 8, with no AI Overview appearances
The no schema page was crawled but never indexed at all

The page with proper schema ultimately ranked for 26 relevant keywords within Google's AI Overviews and received citations.

The AISO Case Study

Another controlled test compared two identical websites—one with schema markup, one without. When queried by ChatGPT:

The site with schema markup provided "more detailed and authentic information"
Schema allowed better control over brand story and how information appeared in LLM responses
The structured site received more accurate citations

Real-World Performance Data

Multiple case studies demonstrate measurable impact:

Xponent21 Agency: Implementing comprehensive schema markup as part of an AI SEO strategy contributed to 4,162% traffic growth and achieving top positions in both Google AI Overviews and Perplexity for target queries

The Search Initiative: After implementing Article and FAQ schema, a client's page ranked for 26 relevant keywords within Google's AI Overviews

case study and experiments Analysis: why is there a Contradiction

So how do we make sense of these seemingly contradictory findings? The key lies in understanding the difference between LLM training and search system architecture.

The shared experiments against schema for AISEO are technically correct: Pure LLMs processing raw web content during training don't interpret schema as structured data. Tokenization breaks the structured markup into fragments.

But AI search systems don't work that way in production. Here's the crucial distinction:

Google's AI Overviews use a hybrid system. They don't just rely on LLM training data—they perform real-time retrieval from Google's index, which does understand structured data.
Indexing happens before generation. Before any AI generates an answer, Google's crawlers must first index your content. Structured data significantly affects:
- Whether you get indexed at all
- How your content is categorized and understood
- Which entity relationships are recognized
- How your page is stored in the Knowledge Graph
RAG systems are different from pure LLMs. Retrieval-Augmented Generation (used by Google AI Overviews, ChatGPT Search, and Perplexity) doesn't just rely on training data—it retrieves and processes content in real-time, where structured data can provide crucial context.

The Entity Connection: Why Structured Data Still Wins

The real power of structured data for AI search lies in entity resolution and relationship mapping—concepts deeply embedded in Google's patents and the semantic web.

How Entity Recognition Works

When you implement schema markup, you're not just helping LLMs—you're helping search engines:

Resolve entities: Clearly identify that "Apple" on your page means Apple Inc. (the company), not the fruit
Map relationships: Establish connections between your brand, products, people, and concepts
Populate knowledge graphs: Feed Google's 800 billion facts about 8 billion entities
Enable semantic search: Allow AI systems to understand context, not just keywords

The Knowledge Graph Effect

Google's Knowledge Graph expanded from 570 million entities to 8 billion entities with 800 billion facts in just over a decade. This semantic understanding is the "life-blood of modern search."

When you use structured data:

You claim your entity in Google's semantic network
You define the relationships that matter
You make your content "citation-ready" for AI systems
You increase semantic clarity for both humans and machines

Entity clarity can be the trigger that determines whether you are selected and recognised as a source of a clear answer in both AI search systems and traditional search.

The Verdict: Yes, You Should Absolutely Use Structured Data

Here's my conclusion based on the evidence:

For Traditional Search Systems (Including AI Overviews), Structured data is essential:

Proper schema can mean the difference between indexing and invisibility
Well-implemented structured data correlates with AI Overview appearances
Schema markup directly impacts traditional SEO, which feeds into AI systems

For Pure LLM Training Data, Structured data has limited direct impact as tokenization breaks schema markup during training.

But Here's the Critical Point

You're not optimizing for LLM training—you're optimizing for hybrid AI search systems that:

Crawl and index using traditional search infrastructure (which reads schema)
Populate knowledge graphs (which depend on entity relationships)
Perform real-time retrieval and augmentation (where structured context matters)
Synthesize answers from indexed, structured content

Modern AI search isn't just ChatGPT or Claude reading web pages. It's:

Google AI Overviews and AI Mode: Powered by real-time retrieval from Google's structured index
Perplexity: Uses web search APIs that understand structured data
ChatGPT Search: Retrieves from search systems, not just training data
Bing Copilot: Integrates Bing's semantic search infrastructure

All of these systems benefit from structured data at the retrieval and indexing stage, even if their LLM components don't directly parse schema during generation.

Implementation Recommendations

Based on the research and experiments, here's what you should prioritize:

1. Core Schema Types for AI Search

Article/BlogPosting: For editorial content (boosts AI Overview visibility)
FAQPage: Critical for Q&A content and voice search
HowTo: For instructional content (step-by-step formats)
Organization: Establishes your brand entity
Product: With nested Offer and Review schemas
BreadcrumbList: Improves structural understanding

2. Entity Relationship Mapping

Use sameAs properties to link your entities to authoritative sources (Wikipedia, Wikidata, LinkedIn)
Implement Organization schema with clear identifiers
Create internal linking patterns that reinforce entity relationships
Build a "mini Knowledge Graph" through your site architecture

3. Don't Neglect Visible Text

Visible, well-structured text is non-negotiable. Your schema should complement (not replace) clear, accessible content.

Best practices:

Write in short, factual blocks (2-4 sentences per paragraph)
Use clear headings that establish entity context
Include explicit answers to questions
Make key information visible in HTML, not just in schema

4. Focus on JSON-LD

Google recommends JSON-LD format because it's:

Easier to implement and maintain
Separates markup from content
More scalable for large sites
Less prone to errors

5. Test and Monitor

Use Google's Rich Results Test for validation
Monitor AI Overviews appearance with tools like BrightEdge or Conductor
Track entity visibility in Knowledge Panels
Measure citation rates across AI platforms

The Bigger Picture: Entity-First SEO is the Future

The structured data debate reveals a broader shift in search optimization.

AI search is accelerating the move from keyword-matching to semantic understanding. With ChatGPT handling 2.5 billion prompts daily and AI Overviews appearing for nearly 19% of US searches, visibility now depends on:

Semantic clarity: Being recognized as an authoritative entity
Relationship mapping: Establishing your place in the semantic network
Structured communication: Making your meaning machine-readable
Knowledge graph presence: Existing in Google's entity database

Structured data is how you achieve all four.

Structured data doesn't primarily help AI by being read during model training. It helps AI by:

Ensuring your content gets indexed in the first place
Populating knowledge graphs that AI systems query
Establishing entity relationships that enable semantic search
Providing structured context for real-time retrieval systems
Enhancing traditional search signals that AI Overviews depend on

The question isn't whether to use structured data for AI search optimization—it's how quickly you can implement it comprehensively.

That said...

LEarn how to quickly implement entity relationship networks, knowledge graphs and structured data for AISEO 🌟

Our latest course on the MLforSEO Academy is designed as a practical, systems-focused journey from understanding entities to building and optimizing knowledge graphs and finally engineering brand authority for AI search and LLMs.

You’ll move through a set of core modules, each broken down into short, focused lessons with concrete examples, mini case studies, and tool-driven exercises (e.g. interrogating Google’s Knowledge Graph, doing entity audits, and working with the Knowledge Graph API).

Two optional knowledge primers sit alongside the main pathway – one on entities/NER and one on knowledge graphs – so you can quickly catch up or deepen specific technical foundations without slowing down the strategic narrative of the core course.

START TODAY ✨

100+ forward-thinking marketers are already taking our courses 💜

Semantic ML-enabled Keyword Research Course - MLforSEO Academy

Introduction to Machine Learning for SEO - MLforSEO Academy

AI Search & LLMs: Entity SEO and Knowledge Graph Strategies for Brands - MLforSEO Academy

Community discussion 🌟

As search continues its evolution from keywords to entities, from text to meaning, structured data becomes your bridge to being understood, remembered, and cited by the machines that increasingly determine what people see.

The verdict is clear: Yes, you should absolutely use structured data and entity relationship mapping to enhance visibility in both traditional and AI search systems.

Not because LLMs magically read your schema during training, but because the actual infrastructure of AI search—the indexing, the knowledge graphs, the retrieval systems, the semantic networks—all depend on structured, machine-readable information.

What's your experience with structured data and AI search?
Have you seen measurable improvements in AI Overview visibility or LLM citations after implementing schema markup?

Reply and share your findings—the community learns best when we share real results.

Join 670+ AI/ML-interested marketers on our Slack community to stay up to date with discussions on AI/ML automation in SEO and marketing.

Happy learning! ✨

Lazarina

Unsubscribe · Preferences

MLforSEO Newsletter ✨

Structured data in the AI Search Era - yay or nay? ✨ MLforSEO Newsletter #010