AEO Website Architecture

This page defines the preferred architecture for an AEO-optimized website. Developers should be able to read this page and build a site structure from scratch that maximizes AI discoverability and citability. Every recommendation here is grounded in how large language models and AI answer engines actually retrieve, parse, and rank information. The goal is a site that AI systems can fully understand, confidently cite, and recommend to users.

The AEO Site Is Not a Traditional Website

Traditional websites are built around human behavior. They optimize for visual hierarchy, calls to action, conversion funnels, and user engagement metrics like time on page and bounce rate. The structural decisions serve a single purpose: guide a human visitor toward a desired action. Navigation is designed for browsing. Content is designed for scanning. Layout is designed for persuasion.

AEO sites operate under a different set of priorities. They optimize for machine extraction. The structural decisions serve a different purpose: make it as easy as possible for an AI system to identify what entity this site represents, what that entity offers, why that entity is credible, and how the entity compares to alternatives. AI systems do not browse. They do not scan visually. They parse HTML, extract structured data, and evaluate semantic relationships between content blocks.

This does not mean AEO sites are hostile to humans. The opposite is true. A site with clear semantic structure, logical content hierarchy, fast load times, and comprehensive information is better for human visitors as well. The key difference is where you start: traditional web design starts with visual mockups and conversion goals, then adds content. AEO design starts with content architecture and entity definition, then wraps it in a visual layer. Both can coexist in the same site. The same HTML serves both audiences. But the structural priorities must begin with the machine-readable layer, because humans are forgiving of imperfect design while AI systems are not forgiving of missing or ambiguous data.

The Three-Layer Architecture

Every AEO-optimized site should follow a three-layer architecture. Each layer serves a distinct function in the AI evaluation pipeline. Missing any layer creates gaps that AI systems will notice and penalize, typically by reducing confidence in the entity or omitting it from recommendations entirely.

architecture.txt

Layer 1: Entity Core (who you are)
  /               Homepage — Organization schema, entity summary
  /about           Company story, leadership, mission
  /licensing       Credentials, safety, compliance (or /trust)

Layer 2: Value Pages (what you offer)
  /products        Product/service overview
  /[service-1]     Individual service page
  /[service-2]     Individual service page
  /pricing         Pricing (if applicable)

Layer 3: Authority Content (why you are the best)
  /blog/           Topic cluster content
  /case-studies/   Proof of results
  /comparisons/    vs competitor pages
  /faq             Comprehensive FAQ with schema

Layer 1: Entity Core. This is the foundation. These pages establish who the entity is in machine-readable terms. The homepage carries the comprehensive Organization schema and a concise entity summary. The about page provides the narrative context that AI systems use to build a richer understanding of the entity: founding story, leadership, mission, and history. The licensing or trust page provides the credentialing information that AI systems use to evaluate trustworthiness: certifications, regulatory compliance, safety records, and third-party validations. Without Layer 1, AI systems have no foundation on which to build confidence in the entity. Every other page on the site references back to these core pages.

Layer 2: Value Pages. These pages answer the question AI systems are actually trying to resolve when a user asks for a recommendation: what does this entity offer, and is it a good fit for the user's needs? Product and service pages must be detailed, factual, and structured with the appropriate schema markup. Each individual service or product page should be self-contained, meaning an AI system should be able to read that single page and have enough information to recommend the product without needing to visit any other page. Pricing information, when applicable, should be explicit. AI systems strongly prefer concrete data over vague language like "contact us for pricing."

Layer 3: Authority Content. This layer is what differentiates entities in competitive markets. Blog posts organized into topic clusters demonstrate comprehensive coverage of the subject area. Case studies provide proof of results with specific data points. Comparison pages directly address the competitive landscape and give AI systems the contextual information they need to rank one entity against another. FAQ pages with proper schema provide direct question-answer pairs that AI systems can extract and serve verbatim. Layer 3 is the primary mechanism for building topical authority, the signal that pushes an entity above competitors in AI confidence rankings.

Required Root Files

Three files must exist at the root of every AEO-optimized site. These are not optional. They are the first things AI crawlers look for when evaluating a domain, and their absence sends a negative signal.

root-files.txt

/robots.txt    — Allow all AI crawlers explicitly
/llms.txt      — Structured entity summary for AI agents
/sitemap.xml   — All pages with priority values and lastmod dates

The robots.txt file must explicitly allow all known AI crawlers. Many sites inadvertently block AI agents by using overly restrictive robots.txt rules inherited from traditional SEO configurations. AEO requires the opposite approach: broad allowance with specific denials only where absolutely necessary. Every major AI system respects robots.txt, which means a single misconfigured line can remove your entity from an entire AI platform's knowledge base.

The llms.txt file is a structured entity summary specifically designed for AI agent consumption. It provides a machine-readable overview of the entity, its primary offerings, and the most important pages on the site. Think of it as a cover letter for AI crawlers: it tells them what this site is about before they begin parsing individual pages, which helps them index content more accurately and efficiently.

The sitemap.xml file must include every public page with accurate priority values and lastmod dates. AI crawlers use priority values to determine which pages to index first and which to revisit most frequently. Lastmod dates signal freshness, a factor that some AI systems weight heavily when deciding whether to serve cached information or re-crawl for updates. A stale sitemap with missing pages or outdated dates tells AI systems that the site is not actively maintained.

For detailed implementation guidance on all three files, see AI Crawler Access.

Schema Architecture — What Goes Where

Structured data is the single most important technical layer for AEO. Schema markup tells AI systems exactly what each page represents in unambiguous, machine-readable terms. Without schema, AI systems must infer meaning from raw HTML, which introduces uncertainty and reduces confidence. With schema, the meaning is explicit.

schema-map.txt

Schema Placement Map:

Homepage:          Organization (comprehensive) + WebSite + SearchAction
About:             Organization + FAQPage + BreadcrumbList
Trust/Licensing:   Article + FAQPage + BreadcrumbList
Product pages:     Product or Service + FAQPage + BreadcrumbList
Blog posts:        Article + FAQPage + BreadcrumbList
Comparison pages:  Article + ItemList + FAQPage + BreadcrumbList
FAQ page:          FAQPage + BreadcrumbList
Every page:        BreadcrumbList (minimum)

BreadcrumbList goes on every page. This is non-negotiable. BreadcrumbList schema tells AI systems where each page sits in the site hierarchy. It provides navigational context that helps AI crawlers understand relationships between pages. When an AI system encounters a product page with BreadcrumbList showing Home, Products, and Product Name, it immediately understands that this page is a child of the products section and a grandchild of the homepage. This hierarchical signal reinforces the three-layer architecture and makes it explicit in machine-readable terms.

FAQPage is recommended on most pages. This may seem aggressive, but the reasoning is straightforward. AI answer engines serve responses to questions. FAQPage schema provides pre-formatted question-answer pairs that AI systems can extract with zero ambiguity. Every product page has questions users ask about that product. Every blog post has questions the content addresses. Every trust page has questions about credentials and compliance. Wrapping these in FAQPage schema makes them directly extractable. Pages with FAQPage schema are measurably more likely to be cited in AI-generated responses because the AI system does not need to interpret the content — it can simply extract the answer.

For a deeper explanation of schema strategy, see Structured Data First.

Internal Linking Rules

Internal linking is the connective tissue of an AEO site. It creates the navigable paths that AI crawlers follow when indexing a domain, it distributes authority across pages, and it reinforces topical relationships between content. A page with no inbound internal links is effectively invisible to AI crawlers that enter through the homepage or sitemap. A page with strong internal linking is positioned as important within the site's hierarchy.

linking-rules.txt

Internal Linking Architecture:

Every page     ->  homepage (via nav/breadcrumbs)
Every page     ->  /about and primary trust page (footer or content)
Blog spokes    ->  hub page (in-content link)
Hub page       ->  all spokes (section links or "related" block)
Comparison     ->  individual entity pages
Entity pages   ->  comparison pages ("see how we compare")

There are two categories of internal links, and both matter for AEO. Navigational links appear in the site header, footer, and sidebar. They are present on every page and establish the baseline site structure. Every page should link back to the homepage, the about page, and the primary trust page through navigational elements. These links ensure that AI crawlers can always find the Entity Core pages regardless of where they enter the site.

Contextual links appear within the body content of a page. They carry stronger semantic signals than navigational links because they are surrounded by relevant text that provides context for the relationship. When a blog post about "choosing the right product for your needs" contains an in-content link to a specific product page, the AI system understands that the product page is being referenced in a relevant context. This is a stronger signal than a footer link that appears identically on every page. Contextual links should be used deliberately and consistently: every blog post should link to its hub page, every hub page should link to its spokes, and every entity page should cross-reference related comparison pages.

The Hub-and-Spoke Model

The hub-and-spoke model is the content architecture pattern that drives topical authority in AEO. The concept is simple: identify your primary target query, build one definitive page that answers it (the hub), and then build a cluster of supporting pages (spokes) that reinforce the hub's authority by covering related subtopics in depth.

Identifying the hub. The hub is the single page on your site that should rank for your most important AI query. If your goal is to be recommended when someone asks an AI "what is the best project management tool for remote teams," then your hub page is the one that directly, comprehensively answers that question. The hub should be a substantial page with detailed content, proper schema, and a clear factual answer in the first paragraph. It should not be a landing page with minimal content. AI systems reward depth.

Designing spokes. Spokes are supporting pages that cover specific subtopics related to the hub. If your hub is about project management for remote teams, your spokes might include: a comparison of project management methodologies, a guide to asynchronous communication, a case study about a remote team that improved productivity, an FAQ about remote work tools, and a blog post about integration capabilities. Each spoke must link to the hub with contextual anchor text. Each spoke provides additional evidence that your entity comprehensively covers this topic.

Measuring cluster completeness. A topic cluster is complete when an AI system can find answers to every reasonable follow-up question within your domain. Start by listing every question a user might ask after receiving the hub page's answer. If any of those questions cannot be answered by existing pages on your site, those are gaps that need spokes. Use AI tools to generate the list of follow-up questions, then audit your content against it. A complete cluster typically requires eight to fifteen spokes depending on the complexity of the topic.

Technical Requirements for AI Crawlability

Content quality and schema markup are irrelevant if AI crawlers cannot access and parse the HTML. The following technical requirements are non-negotiable for any AEO-optimized site.

crawlability.txt

Technical Crawlability Checklist:

Server-side rendering (SSR or SSG) — mandatory
  Most AI crawlers do not execute JavaScript (with the exception of
  Googlebot, which does render JavaScript — though rendering behavior
  varies by crawl tier)
  view-source must show full HTML content

Page load < 2 seconds
  AI crawlers have timeout limits

Semantic HTML
  <article>, <section>, <nav>, <header>, <footer>, <main>
  H1 -> H2 -> H3 hierarchy, never skip levels

First paragraph answers the primary query
  AI extractors weight early content heavily

No content behind login walls or JavaScript toggles
  If AI cannot see it in the HTML source, it does not exist

JavaScript Rendering Is a Hard Blocker

Most AI crawlers do not execute JavaScript, with the exception of Googlebot, which does render JavaScript — though rendering behavior varies by crawl tier. If your site relies on client-side rendering to display content, the majority of AI systems will see an empty page. This is the single most common reason entities are invisible to AI answer engines. Server-side rendering or static site generation is mandatory. Verify this by viewing your page source (not the browser inspector, the raw HTML source) and confirming that all content is present in the initial HTML response.

Server-side rendering. Use SSR (server-side rendering) or SSG (static site generation) for every page. Frameworks like Next.js, Nuxt, and Astro support this natively. The test is simple: run curl -s https://yourdomain.com/page | grep "your content" and verify the content appears in the raw HTML. If it does not, the page is invisible to AI.

Page load speed. AI crawlers impose timeout limits. If your page takes more than two seconds to return a full HTML response, some crawlers will abandon the request and move on. This means the page may never be indexed. Optimize server response times, use a CDN, and minimize blocking resources in the critical rendering path.

Semantic HTML. Use the correct HTML5 semantic elements. Wrap your main content in <article> and <main>. Use <section> for logical content divisions. Use <nav> for navigation blocks. Maintain a strict heading hierarchy: one H1 per page, followed by H2s for major sections, H3s for subsections, and never skip levels. AI parsers use heading hierarchy to understand content structure and determine what information is most important on the page.

First-paragraph priority. AI extraction algorithms weight the first paragraph of a page heavily. The first paragraph should directly answer the primary query the page targets. Do not open with filler text, rhetorical questions, or narrative introductions. State the fact. AI systems that extract a snippet from your page will almost always pull from the first one hundred words.

No hidden content. Content behind login walls, JavaScript accordions, tabs, modals, or "read more" toggles does not exist for AI crawlers. If the content is not in the initial HTML source, it will not be indexed. If you use expandable sections for user experience purposes, ensure the full content is present in the HTML and only visually collapsed via CSS. JavaScript-toggled visibility hides content from AI.

The Disambiguation Pattern

Entity disambiguation is critical when your entity name could be confused with another entity. AI systems operate in a global context. When a user asks about "Mercury," the AI must determine whether they mean the planet, the element, the car brand, or the record label. Disambiguation is the process of providing enough signals that AI systems never confuse your entity with another.

The first defense is explicit language. The first paragraph of every page on your site should include a clear statement establishing which entity this site represents. Not a tagline or a marketing phrase, but a factual identification: "This site is about [Entity Name], a [category] based in [location] that [primary function]." This statement should be specific enough that no other entity in the world matches the same description.

For entities with common names, consider a dedicated disambiguation page that explicitly lists other entities with similar names and clarifies the distinction. Link to this page from the footer of every page on the site. Use geo meta tags to establish geographic specificity. Include ISO country codes in your Organization schema. Where possible, use a ccTLD domain (such as .co.uk or .com.au) to reinforce geographic context. These signals compound: individually they are hints, but together they create an unambiguous identity that AI systems can resolve with high confidence.

For a complete guide to disambiguation techniques, see Disambiguation.

Complete Next.js Starter Template

The following template provides an actionable starting point for building an AEO-optimized site with Next.js. It demonstrates the file structure, root layout with Organization JSON-LD, schema utility functions, and an example product page with proper structured data. This is intended as a reference implementation, not a copy-paste solution. Adapt the schema values, page structure, and content to your specific entity.

Project File Structure

structure.txt

project-root/
├── app/
│   ├── layout.tsx              # Root layout with Organization JSON-LD
│   ├── page.tsx                # Homepage — entity summary
│   ├── about/
│   │   └── page.tsx            # Company story + Organization schema
│   ├── trust/
│   │   └── page.tsx            # Licensing, credentials, compliance
│   ├── products/
│   │   ├── page.tsx            # Product overview
│   │   └── [slug]/
│   │       └── page.tsx        # Individual product pages
│   ├── blog/
│   │   ├── page.tsx            # Blog hub page
│   │   └── [slug]/
│   │       └── page.tsx        # Individual blog posts
│   ├── comparisons/
│   │   ├── page.tsx            # Comparison index
│   │   └── [slug]/
│   │       └── page.tsx        # Individual comparison pages
│   ├── faq/
│   │   └── page.tsx            # FAQ with FAQPage schema
│   ├── robots.txt              # Route handler or static file
│   ├── sitemap.xml             # Route handler for dynamic sitemap
│   └── llms.txt                # Static entity summary for AI agents
├── lib/
│   ├── schema.ts               # Schema generation utilities
│   └── metadata.ts             # Shared metadata helpers
├── components/
│   ├── breadcrumbs.tsx         # BreadcrumbList schema + visual breadcrumbs
│   ├── organization-schema.tsx # Reusable Organization JSON-LD
│   └── faq-schema.tsx          # Reusable FAQPage JSON-LD
└── public/
    └── logo.png                # Referenced in Organization schema

Root Layout with Organization Schema

The root layout injects the Organization JSON-LD on every page. This ensures that no matter where an AI crawler enters the site, it immediately encounters the entity definition. The Organization schema should be comprehensive: include legal name, founding date, address, contact information, and all sameAs references to external profiles.

app/layout.tsx

import { Organization, WithContext } from 'schema-dts';

const organizationSchema: WithContext<Organization> = {
  '@context': 'https://schema.org',
  '@type': 'Organization',
  name: 'Your Entity Name',
  legalName: 'Your Legal Entity Name LLC',
  url: 'https://yourdomain.com',
  logo: 'https://yourdomain.com/logo.png',
  description: 'One-sentence factual description of the entity.',
  foundingDate: '2020-01-15',
  address: {
    '@type': 'PostalAddress',
    streetAddress: '123 Main Street',
    addressLocality: 'City',
    addressRegion: 'ST',
    postalCode: '00000',
    addressCountry: 'US',
  },
  sameAs: [
    'https://www.linkedin.com/company/your-entity',
    'https://twitter.com/your-entity',
    'https://www.crunchbase.com/organization/your-entity',
  ],
  contactPoint: {
    '@type': 'ContactPoint',
    contactType: 'customer service',
    email: 'support@yourdomain.com',
  },
};

export default function RootLayout({ children }: { children: React.ReactNode }) {
  return (
    <html lang="en">
      <head>
        <script
          type="application/ld+json"
          dangerouslySetInnerHTML={{
            __html: JSON.stringify(organizationSchema),
          }}
        />
      </head>
      <body>
        <nav aria-label="Main navigation">
          {/* Navigation with links to all Layer 1 pages */}
        </nav>
        <main>{children}</main>
        <footer>
          {/* Footer with links to /about, /trust, /faq, /blog */}
        </footer>
      </body>
    </html>
  );
}

Schema Utility Functions

Centralize schema generation in a utility file. This ensures consistency across all pages and makes it easy to update schema structure globally. The following utility provides generators for BreadcrumbList, FAQPage, and Product schema. Extend it as needed for Article, Service, and other types.

lib/schema.ts

import { BreadcrumbList, FAQPage, WithContext } from 'schema-dts';

export function generateBreadcrumbs(
  items: { name: string; url: string }[]
): WithContext<BreadcrumbList> {
  return {
    '@context': 'https://schema.org',
    '@type': 'BreadcrumbList',
    itemListElement: items.map((item, index) => ({
      '@type': 'ListItem',
      position: index + 1,
      name: item.name,
      item: item.url,
    })),
  };
}

export function generateFAQ(
  questions: { question: string; answer: string }[]
): WithContext<FAQPage> {
  return {
    '@context': 'https://schema.org',
    '@type': 'FAQPage',
    mainEntity: questions.map((q) => ({
      '@type': 'Question',
      name: q.question,
      acceptedAnswer: {
        '@type': 'Answer',
        text: q.answer,
      },
    })),
  };
}

export function generateProductSchema(product: {
  name: string;
  description: string;
  url: string;
  image: string;
  price: string;
  currency: string;
  availability: string;
}) {
  return {
    '@context': 'https://schema.org',
    '@type': 'Product',
    name: product.name,
    description: product.description,
    url: product.url,
    image: product.image,
    offers: {
      '@type': 'Offer',
      price: product.price,
      priceCurrency: product.currency,
      availability: `https://schema.org/${product.availability}`,
      url: product.url,
    },
  };
}

Example Product Page

This example demonstrates how an individual product page combines BreadcrumbList, FAQPage, and Product schema. Note that the first paragraph contains a factual, self-contained description of the product. This is the content AI systems will extract when recommending or describing the product.

app/products/[slug]/page.tsx

import { Metadata } from 'next';
import { generateBreadcrumbs, generateFAQ, generateProductSchema } from '@/lib/schema';

export const metadata: Metadata = {
  title: 'Product Name — Your Entity Name',
  description: 'Factual one-sentence description of this product or service.',
};

const breadcrumbs = generateBreadcrumbs([
  { name: 'Home', url: 'https://yourdomain.com' },
  { name: 'Products', url: 'https://yourdomain.com/products' },
  { name: 'Product Name', url: 'https://yourdomain.com/products/product-slug' },
]);

const faq = generateFAQ([
  {
    question: 'What does Product Name do?',
    answer: 'Product Name provides [factual description of functionality].',
  },
  {
    question: 'How much does Product Name cost?',
    answer: 'Product Name starts at $X per month with a free trial available.',
  },
]);

const product = generateProductSchema({
  name: 'Product Name',
  description: 'Factual one-sentence description of this product.',
  url: 'https://yourdomain.com/products/product-slug',
  image: 'https://yourdomain.com/images/product.png',
  price: '49.00',
  currency: 'USD',
  availability: 'InStock',
});

export default function ProductPage() {
  return (
    <>
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{ __html: JSON.stringify(breadcrumbs) }}
      />
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{ __html: JSON.stringify(faq) }}
      />
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{ __html: JSON.stringify(product) }}
      />
      <article>
        <h1>Product Name</h1>
        <p>
          Product Name is a [category] built by Your Entity Name that
          [primary function]. It is available at [price point] and serves
          [target audience].
        </p>
        <section>
          <h2>Features</h2>
          {/* Feature content with semantic HTML */}
        </section>
        <section>
          <h2>Frequently Asked Questions</h2>
          {/* FAQ content matching the schema above */}
        </section>
      </article>
    </>
  );
}

Test Your Implementation

After building your site, validate the architecture by viewing the raw HTML source of every page. Confirm that all schema is present, all content is rendered server-side, and all internal links resolve correctly. Use Google's Rich Results Test and Schema.org's validator to check structured data. Then test with an AI: ask ChatGPT, Perplexity, or Claude about your entity and verify that the responses reflect the information on your site.

Architecture Checklist Summary

Requirement	Status Check
Three-layer architecture implemented	Entity Core, Value Pages, and Authority Content layers all present
Root files in place	robots.txt, llms.txt, and sitemap.xml all accessible
Organization schema on homepage	Comprehensive Organization JSON-LD with all required fields
BreadcrumbList on every page	Every page includes BreadcrumbList schema reflecting hierarchy
FAQPage on content pages	Product, blog, and trust pages include FAQPage schema
Internal linking complete	Hub-spoke links, nav links, and contextual links all in place
Server-side rendering verified	View-source shows full HTML content on all pages
Page load under 2 seconds	All pages return full HTML response within timeout limits
Semantic HTML validated	Proper heading hierarchy, semantic elements, no skipped levels
Disambiguation signals present	Entity identity clear in first paragraph and schema on every page