Pillar Guide

How to Audit a Page for AI Readiness

A page audit for AI readiness evaluates a single URL across seven dimensions to determine whether AI systems can access, understand, trust, and extract content from it. Unlike traditional site crawls, a page-level audit goes deep on one URL — checking 120+ signals and outputting a scored report with prioritized fixes.

AI Search Visibility TeamFebruary 20, 202620 min read

Why Page-Level Audits Matter More Than Site Audits

AI systems cite pages, not domains. Your homepage might score 8.5 while your pricing page scores 3.2 — and the pricing page is the one people are searching for. Traditional site crawls find technical issues across thousands of pages but give you no prioritized action plan for any single URL.

82.5% of AI citations link to nested pages, not homepages (Onely 2025). 47% of Google AI Overview citations come from pages ranking below position #5 — content quality and structure, not ranking position, determines what gets cited.

Key Takeaway

The use case: You wrote a 3,000-word guide and it never appears in AI answers. A page audit shows you exactly why — whether it's a blocked crawler, JS-only rendering, missing author attribution, or no schema. Site audits can't do this.

The 7-Dimension Framework

Every AI Search Visibility page report evaluates these seven dimensions. Dimensions 1–6 are scored on a 0–10 scale and combined into an overall score. Dimension 7 (Risk) issues flags rather than scores — a single flag can disqualify an otherwise high-scoring page.

1

Crawlability & Access

15%

Can AI crawlers reach, fetch, and parse your page? This is a binary gate — if AI crawlers are blocked, no other optimization matters.

What it checks

  • robots.txt directives for GPTBot, OAI-SearchBot, Claude-SearchBot, PerplexityBot
  • HTTP status code (200 vs 4xx/5xx)
  • Canonical tag (self-referencing, not pointing away)
  • JavaScript rendering — does content exist without JS execution?
  • Mixed content (HTTPS page loading HTTP resources)
  • noindex / nosnippet meta tags

69% of AI crawlers cannot execute JavaScript (SearchVIU 2025). If your content is JS-rendered, it simply doesn't exist for most AI systems.

Pass: All AI crawlers allowed, page loads in clean HTML, canonical is self-referencing, no nosnippet tag.
Fail: User-agent: GPTBot / Disallow: / in robots.txt — page is invisible to ChatGPT citations regardless of content quality.

Common issues found

AI crawlers blocked in robots.txt (31% of audited pages)Canonical pointing to a different URLJavaScript-only content renderingnosnippet meta tag
2

Snippet & CTR Signals

15%

Does your page produce a clear, compelling snippet in AI results? AI systems read title and description first to assess topic fit.

What it checks

  • Title tag: length, keyword presence, boilerplate detection
  • Meta description: unique, not auto-generated
  • H1: present, single, matches title intent
  • Open Graph tags: og:title, og:description, og:image
  • Breadcrumb presence (visible and/or schema)
  • Date visibility in SERP (datePublished in schema)

Boilerplate titles ('Home | Company Name') are the #1 Google title rewrite trigger. Rewritten titles are less likely to be selected by AI systems as representative of the page.

Pass: Title: 'Schema Markup for AI Search: Types That Get You Cited (2026 Guide)' — specific, keyword-rich, dated, under 60px.
Fail: Title: 'Services – Acme Corp' — no keyword, no value signal, 100% boilerplate. AI systems skip these for more specific results.

Common issues found

Duplicate meta descriptions across pagesnosnippet tag (prevents AI from extracting any text)Title pixel width over 600pxH1 absent or mismatched with title
3

Intent & Content Value

20%

Does your content match what users are actually searching for, and does it add unique value? Content-intent mismatch is a hard blocker.

What it checks

  • Search intent match (informational / navigational / commercial / transactional)
  • Content depth — thin content < 200 words flagged automatically
  • Information gain: unique value beyond competitor pages
  • First-hand experience signals (specific details, named examples, original data)
  • Answer-first architecture: direct answer in first 60 words
  • Filler content detection: data point density per 500 words
  • AI writing pattern detection: formulaic structure, low burstiness

Google's OriginalContentScore (confirmed in API leak) evaluates uniqueness regardless of length. Princeton GEO paper: keyword stuffing has near-zero or negative effect on AI citation rates.

Pass: 4,200-word guide on email marketing with original survey data, step-by-step screenshots, and tool comparisons — fulfills the title promise with unique value.
Fail: 'Complete Guide to Email Marketing' with 400 words and 6 H2s with 2 sentences each — skeleton content that triggers thin-content classifiers.

Common issues found

Content-intent type mismatch (wrong format for query)Search-engine-first content (keyword stuffing, template-based)Scaled content abuse (mass AI content without value-add)YMYL topic without adequate E-E-A-T credentials
4

Trust & E-E-A-T

20%

Does your page demonstrate genuine expertise and transparency? 96% of AI Overview citations come from verified authoritative sources.

What it checks

  • Author byline present and linking to bio with credentials
  • Person schema in JSON-LD (name, jobTitle, affiliation, sameAs)
  • About page existence and quality
  • External source citations on factual claims (3+ per page)
  • Publication date and last-updated date visible
  • YMYL classification + appropriate disclaimers
  • AI content disclosure (if applicable)

Pages with expert author attribution are cited at 2.4x the rate of anonymous pages (PresenceAI). 70.4% of sources cited by ChatGPT include Person schema in JSON-LD (EverTune).

Pass: Named author with 'Head of SEO, 8 years experience' byline, links to full bio, Person schema with LinkedIn sameAs, 5 external citations to named studies.
Fail: All pages attributed to 'Admin' or 'Staff', no bio page, no credentials, no citations — anonymous content is AI-invisible for factual queries.

Common issues found

No author attribution on YMYL contentFake expert personas (January 2025 QRG violation)Missing YMYL disclaimers (medical/financial/legal)AI content not disclosed
5

Schema Markup

10%

Is your content machine-readable via JSON-LD schema? Only 12.4% of websites implement structured data — a major competitive advantage for those who do.

What it checks

  • JSON-LD presence in <script type="application/ld+json">
  • @context validity (missing = BLOCKER — silently ignored)
  • Schema type appropriateness for page type
  • Required properties for each schema type
  • Content-schema match (schema must match visible content)
  • datePublished / dateModified in Article schema

FAQPage schema: 3.2x more likely to appear in Google AI Overviews (Frase). GPT-4 accuracy improves from 16% to 54% when content uses structured data.

Pass: Article + FAQPage + BreadcrumbList in JSON-LD. @context present, dateModified current, all schema content matches visible page content.
Fail: Schema with missing @context — silently invalid and ignored by all parsers. Or Product schema claiming 5-star reviews when page has no reviews.

Common issues found

Missing @context: 'https://schema.org'Schema content doesn't match visible page contentSelf-referencing reviews (manual action trigger)
6

AI Extractability / Citeability

20%

Can AI systems pull clean, standalone quotes from your content? 44.2% of all LLM citations come from the first 30% of text.

What it checks

  • Answer-first architecture: direct answer in first 60 words
  • Self-contained 'answer capsules' after each H2/H3 (40-60 words)
  • External citation count and quality (3+ credible sources per page)
  • Content structure: tables, ordered lists, numbered steps
  • Marketing language density (superlatives, vague qualifiers)
  • Entity density (target: 15-20 named entities per 1,000 words)
  • llms.txt implementation

Pages with external citations: 34.9% AI selection rate vs. 3.2% without (PresenceAI). Comparison tables increase citation rates 2.5x vs. unstructured text.

Pass: Opens with a 55-word direct answer, each H2 followed by a self-contained paragraph, 5 external citations, one comparison table, FAQPage schema.
Fail: Opens with 'In today's digital landscape...' followed by 300 words of background before answering. No citations, no tables. AI has nothing clean to extract.

Common issues found

No answer in first 60 words (content not extractable)Zero external citations100% marketing language ('industry-leading solution')No structured content (no tables, no lists)
7

Risk Analysis (Red Team)

Flags

Does your page have signals that make AI systems avoid citing it? Risk flags are disqualifiers — a 9.2/10 page can still be penalized.

What it checks

  • Google 2024 spam policy violations (scaled content, site reputation abuse)
  • FTC violations: fake reviews, undisclosed affiliate content
  • EU AI Act disclosure requirements
  • Cookie consent / GDPR compliance
  • Hidden content signals (invisible to users, readable by crawlers)
  • Malware / cryptomining script signatures
  • Deceptive dark patterns (manipulative urgency, subscription traps)

Scaled AI content was the #1 reason for domain-level manual actions in 2025. FTC fake review violations: up to $50,000 per violation.

Pass: No hidden content, GDPR consent banner, affiliate disclosure above the fold, no spam policy violations, legitimate schema only.
Fail: Fake 5-star review schema with no visible reviews, cryptomining script in page footer, undisclosed affiliate links — any one of these triggers manual review.

Common issues found

Cryptomining or malware scriptsFake reviews / fabricated testimonialsYMYL misinformation without disclaimerHidden text (same color as background)

Severity Classification

Every issue found in an audit is classified at one of four severity levels. The classification determines the order of fixes — Blockers must be resolved first because they prevent all other optimizations from mattering.

SeverityDefinitionExpected Action
BlockerPrevents AI from accessing or citing the page entirelyFix immediately — other optimizations are moot until resolved
HighSignificantly reduces citation probabilityFix within 1–2 weeks
MediumReduces citation quality or frequencyFix within 1–2 months
LowMinor improvement opportunityFix when convenient

Effort Estimation

Every fix also receives an effort estimate so you can prioritize quick wins over high-effort changes when both have similar impact.

LabelTimeExamples
XS< 30 minutesAdding FAQPage schema, fixing robots.txt, adding author byline, updating dateModified
S30 min – 2 hoursRewriting opening paragraphs, adding external citations, fixing meta descriptions, adding Person schema
M2–8 hoursAdding comparison tables, improving content depth, implementing full schema suite, creating author bio page
L> 8 hoursFull content rewrite, building topic cluster, establishing original research, YMYL compliance overhaul

Score Interpretation

8–10

Pass

Strong AI visibility. Page is likely being cited or is close to it. Focus on maintaining freshness and expanding content depth.

5–7.9

Needs Fix

Significant gaps. AI may occasionally cite this page but inconsistently. Address High-severity issues first.

0–4.9

Critical

Multiple blockers. Page is unlikely to appear in any AI citations. Fix Blockers immediately before any other work.

Sample Audit Walkthrough

Here's what a typical audit result looks like for a mid-performing content page. This example reflects real patterns we see across pages in the 5–7 score range.

Sample page

example.com/blog/email-marketing-guide

6.1/10

Needs Fix

Crawlability
9.5
Snippet & CTR
6.2
Intent & Value
7.1
Trust & E-E-A-T
4.8
Schema
7
AI Citeability
5.3
Risk Flags

Top 3 fixes (priority order):

Blocker

Add author byline + Person schema

No author attribution on a commercial-intent content page. Add named author, link to bio, and add Person schema with jobTitle and sameAs.

S
High

Add external citations (3+)

Zero cited sources. Add 3–5 external links to credible studies or primary sources. This alone moves selection rate from 3.2% to 34.9%.

S
High

Rewrite opening 60 words to answer-first

Content opens with background context. Restructure to place the direct answer in the first 55 words, then expand.

S

Run Your Own Audit

Two options: automated (60 seconds, full 120+ signal report) or manual (10 minutes, surface-level check).

Manual 10-Minute Checklist

1Check /robots.txt — Ctrl+F for GPTBot, OAI-SearchBot, PerplexityBot. Any Disallow: / = blocker.
2Disable JavaScript in browser — does page content still appear? If not: JS rendering problem.
3View page source — is content in the HTML or just <div id="app">? Content must be in the source.
4Check for a named author byline with credentials above or below the article.
5Read the first 60 words — does it directly answer a query? Or does it open with background?
6Count external citations — aim for 3+ linked credible sources per page.
7Check for FAQPage schema — browser DevTools → Application → script tags.
8Count tables and numbered lists — structures that increase citation rates 1.7–2.5x.
9Check last-modified date — is it visible on the page? Content > 12 months old loses citation rate.
10Check Google Search Console for any manual actions or coverage issues.

Skip the manual work

Paste any URL and get the full 120+ signal report in 60 seconds — scored across all 7 dimensions with prioritized fixes.

Be first to accessFree audit for early signupsNo spam

A site audit crawls hundreds or thousands of pages for technical issues but gives you no prioritized action plan for any single page. A page audit goes deep on one URL — checking 120+ signals across all 7 dimensions — and outputs a scored report with specific, prioritized fixes. AI systems cite pages, not domains, so page-level analysis is more actionable for AI visibility.

Start with your highest-value pages: the pages you most want to appear in AI answers for your target queries. This typically means your pricing page, key product or service pages, and your most-trafficked content articles. 82.5% of AI citations link to nested pages, not homepages — so your homepage is rarely the highest priority.

Re-audit after every significant content change, after major algorithm updates, and on a quarterly schedule for your highest-priority pages. Time-sensitive content (news, pricing, product specs) should be re-audited monthly. Evergreen content audited quarterly is sufficient if no major changes occurred.

No — focus on Blocker and High severity issues first. Blockers prevent AI from accessing or citing the page entirely; fixing them has immediate impact. High issues significantly reduce citation probability. Medium and Low issues improve quality incrementally. A page with all Blockers and High issues resolved is likely to perform better than a page with everything fixed except one Blocker.

AI crawlers blocked in robots.txt — found in 31% of audited pages. The second most common is JavaScript-only rendering: 69% of AI crawlers cannot execute JavaScript, so content inside React/Vue/Angular components that aren't server-rendered is invisible. The nosnippet meta tag is less common but completely prevents AI from extracting any text from the page.

Yes. Auditing competitor pages is one of the highest-value uses of the tool. Understanding why a competitor's page gets cited over yours — and which specific signals they have that you don't — gives you a concrete implementation checklist. Focus on their schema implementation, content structure, and author attribution.

Significantly. Many AI visibility signals overlap with Google's core ranking factors: content depth, E-E-A-T, schema markup, page speed, and canonicalization. Fixing AI blockers typically improves traditional rankings simultaneously. The main divergence is that AI citation prioritizes extractability and author attribution more than traditional PageRank signals.

Pages scoring 8.0+ on the 0–10 scale have a significantly higher probability of appearing in AI citations. The 8.0 threshold maps to: AI crawlers allowed, answer-first structure in place, 3+ external citations, author attribution with Person schema, and at least FAQPage or Article schema implemented. Scores below 5.0 indicate multiple blockers that need immediate attention.

Related Resources