AI SEO fundamentals: content, technical, popularity

SEO has always rested on three pillars: content, technical, popularity. Pick any decade since the late 90s and that’s the shape of the job. AI SEO hasn’t replaced any of them.

1. Content pillar: you still need the answer

Old job: keyword research for what humans type into a Google box. Map intents, write pages that answer them, get clicks.
New job: same, plus keyword research for what the language model types into its own retrieval step before it answers the human.

Google calls that second step query fan-out: the model decomposes one user question into multiple sub-queries and fetches facts for each before composing the answer.

“best wireless earbuds for running in the rain”
- IP ratings
- Fit during movement
- Battery life
- Codec support
- Price ranges

Each sub-query hits a search index. Each one ranks pages. Each one decides what the model sees before it writes a single token.

You’re now writing for two query layers: the user’s question, and the model’s decomposition of it.

How do you find the sub-questions a model would generate? Two free ways:

Ask the model. Type your topic into ChatGPT or Gemini and it lists them.
Read the SERP. People Also Ask and AI Overview bullets already show the fan-out.

This is not a new skill. It’s the same skill you used to win featured snippets and People Also Ask, applied with the volume turned up. We made that timeline argument in GEO will die. SEO won’t: every “AI-era” content technique is a thing SEOs have been doing since 2014. Fan-out just makes it less optional.

2. Technical pillar: bots run on a budget

Crawling the web costs energy. It always did. Every fetched page is a TCP connection, a TLS handshake, an HTML parse, sometimes a headless render, plus storage, plus indexing. Multiply by a few hundred billion URLs and you get a real operational constraint.

That’s why crawl budget exists.

That’s why robots.txt exists.

That’s why Google documents size limits on what it will fetch and parse.

AI search makes this harder, not easier. Traditional search crawls a page once and reuses the index for millions of queries. Retrieval for AI answers is per-query: when a user asks something, an agent often fetches live pages to ground the answer. That’s more fetches per useful answer, not fewer.

So the technical rules tighten:

Clean HTML. Content in the initial server response, not assembled later by JavaScript. Most AI crawlers don’t render JS at all, or render it poorly.
Small payloads. Pages that come back fast and parse cheaply get crawled deeper and more often.
Stable URLs and predictable structure. Agents follow links, hit canonicals, expect what they expect.

This isn’t a new rule.

It’s the old rule with the volume turned up.

If you render content client-side, you’re invisible to most new consumers, and the cheapest page wins more often than the cleverest one.

3. Popularity pillar: the one that looks like it changed

Two filters decide what an AI answer is allowed to draw from. Both tie back to traditional SEO.

(a) Off-page authority: agents call a good old-fashioned web search

The “LLM answer” you see in ChatGPT, Claude, or Gemini is rarely pure model knowledge. It’s an agent running a good old-fashioned web search and reading the top results:

Assistant	Search engine it calls
ChatGPT search	Bing
Perplexity	Multiple traditional indexes
Gemini (grounding)	Google
Claude (web tool)	Brave Search

In every case, a traditional ranker decides what the model is allowed to see before it writes a word. PageRank, link signals, freshness, domain authority, all the off-page SEO, sits upstream of the answer.

The shorter way to say it: backlinks didn’t stop mattering. They moved one layer up the stack.

(b) On-page authority: same selection as Google’s algorithm

The deeper mechanism happens before the model ever runs. If (a) is PageRank by another name, (b) is EEAT in code. The training filter rewards what Google’s quality raters have rewarded for years: expertise, authoritativeness, trustworthiness.

How LLMs actually learn explains both the risk and the fix:

It reads enormous amounts of text. Including all the spam, bullshit, and fiction of the web.
It learns by frequency. Every time it sees “best CRM for” followed by “Pipedrive,” the association gets reinforced.
There’s no PageRank here. The model has no concept of “which site linked to which.” Just co-occurrence and frequency.
So volume alone would win. That low-quality content has to get filtered out somewhere.

Production LLMs aren’t trained on raw Common Crawl. They’re trained on it after aggressive quality filtering, which selects for the same signals Google has ranked for fifteen years: structured, sourced, edited, authoritative. Different mechanism, same outcome.

If you want to dig deeper: the receipts

Source	What it does	Why it matters
C4 / T5, Raffel et al., 2020	Blocklist, terminal-punctuation rule, boilerplate removal, dedup, language filter	Google’s own pretraining data is curated against quality heuristics, not consumed raw
CCNet, Wenzek et al., 2020	Scores pages by perplexity against a Wikipedia-trained LM, keeps low-perplexity ones	Open-source LLMs literally select for pages that read like Wikipedia
DSIR, Xie et al., 2023 (Stanford)	Resamples training corpora toward a high-quality target distribution	Tilts the model toward authoritative-looking sources by construction
Frontier pretraining pipelines	”Quality classifier” trained on Wikipedia + books as positives, raw web as negatives	Authority IS the filter

If you’ve ever audited a site against EEAT, you’ve already audited it against the LLM training filter. Same signals, different judge.

Both filters reward what traditional SEO has always rewarded. (a) leans on off-page authority, (b) on on-page quality.

For the strategic version, with what it means for KPIs and the CMO conversation, see Google just admitted GEO is SEO. For the original call three months before Google said it, our February 13 piece.

The pillars didn’t move. The ground around them did. Build on the pillars.

Every SEO professional leans slightly toward one pillar over the others. Which one do you find the most challenging? Tell us on LinkedIn.