← All Skills
AI Skill

aeo-markdown-render

Serve a markdown version of every page for LLM crawlers. The +300% AI citation trick (G2 experiment, AEO Conf 2026). Triggers — 'markdown for LLMs', 'AI crawler

Quick Install
npx skills add aeo-markdown-render

Skill: AEO Markdown Rendering

Serve a parallel markdown version of every public HTML page so LLM crawlers can ingest content without parsing HTML, JavaScript, or CSS noise.

Why this works (the +300% number)

G2 ran this experiment in 2025 and reported a +300% increase in AI citations and AI context-fetch requests after publishing markdown versions of their pages. Same content, basic markdown rendering, no rich text required — the gain came purely from making content easier for LLM crawlers to parse.

Mechanism: when an LLM engine (Claude, ChatGPT, Perplexity) needs to deeply read a page (vs. snippet from search), it issues a context-fetch request. Markdown is ~5-10× smaller and 100× cleaner than the corresponding HTML. Crawlers prefer it; some crawlers refuse to deeply ingest pages above a certain HTML/JS payload threshold.

Approach: 3 viable patterns

Pick whichever fits the stack. They're equivalent for AEO purposes.

Pattern A: .md route alongside .html

For every page at /foo/bar, also serve /foo/bar.md (or /foo/bar/index.md). LLM crawlers learn the convention and prefer the .md URL.

  • Next.js: dynamic route app/[locale]/[...slug]/route.ts returns text/markdown based on Accept header OR .md extension in path.
  • Static site / Hugo / Jekyll: emit .md files alongside .html at build time.
  • Custom CMS: middleware that intercepts .md requests, looks up the page, and renders.

Pattern B: Content negotiation on the same URL

Same URL /foo/bar returns either HTML or markdown depending on Accept: text/markdown request header. Cleaner URL space, but some crawlers don't send the header.

Pattern C: Single dump in llms-full.txt

If the site is small (< 500 pages), skip per-page markdown and instead generate one consolidated llms-full.txt (see aeo-assets skill). Less granular but lower engineering cost.

Recommendation: Pattern A for sites > 100 pages. Pattern C for small sites. Skip Pattern B unless content negotiation is already in use.

Markdown generation rules

Markdown should be:

  1. Answer-first: First 40-60 words must directly answer the page's primary question. If the HTML lede is fluff, rewrite for the markdown.
  2. Schema preserved as YAML frontmatter:
  3. ---
       title: "Best AI Design Tools 2026"
       description: "Six tools compared by price, output quality, and use case."
       url: "https://acme.com/best-ai-design-tools"
       updated: "2026-04-26"
       schema_type: "Article"
       author: "Jane Doe"
       ---
  4. Headings preserved exactly — H1 / H2 / H3 must match the rendered HTML. AI engines use heading hierarchy to index sub-questions.
  5. Tables preserved as markdown tables — don't convert to text. AI parses markdown tables as structured data.
  6. Code blocks preserved with language tags — improves citation in technical queries.
  7. No JS/CSS/navigation chrome — strip nav, footer, sidebars, cookie banners, modals. Just main content.
  8. Internal links preservedtext → AI follows links to build site context.
  9. Images: alt text only![alt text]() (omit src). LLMs only need the alt for context.
  10. No duplicate content — if HTML page is mostly programmatically generated boilerplate, the markdown will too. Make sure each page has unique signal (≥ 300 unique words).

Quality template

---
title: "{H1 from HTML}"
description: "{meta description}"
url: "{canonical URL}"
updated: "{YYYY-MM-DD from page Last-Modified or schema}"
schema_type: "{Article|Product|FAQPage|HowTo|...}"

{H1}

{Answer-first paragraph, 40-60 words, fact-anchored, citation-shaped. First sentence directly answers the title's implied question.}

{H2 — first major section}

{Content. Preserve tables, lists, bullet structure.}

HeaderHeader
CellCell

{H2 — second section}

...

FAQ

{Question 1}

{30-80 word answer.}

{Question 2}

{...}
Last updated: {YYYY-MM-DD} • Read this on the web: {URL}

Robots.txt + sitemap glue

Wire the markdown into the AEO discovery surface:

  1. Add to robots.txt:
  2. Allow: /.md$
  3. Add to sitemap-llm.xml: list both /foo AND /foo.md for high-value pages, OR list only .md to nudge crawlers.
  4. Cross-reference in llms.txt:
  5. ## Markdown versions for LLMs
       Every page on this site is also available in markdown by appending .md:
       - {CANONICAL_URL}/about → {CANONICAL_URL}/about.md
       - {CANONICAL_URL}/pricing → {CANONICAL_URL}/pricing.md

Cache & headers

For each .md response:

Content-Type: text/markdown; charset=utf-8
Cache-Control: public, max-age=3600, stale-while-revalidate=86400
Last-Modified: {date}
X-Robots-Tag: index, follow
Link: <{html-canonical-url}>; rel="canonical"

The Link: rel=canonical header tells LLM engines that the markdown is a representation of the HTML page, not a duplicate. Critical: without it, you risk getting both versions de-duplicated by Google's canonical chooser.

Implementation checklist

  • [ ] Build a markdown renderer (page → .md string)
  • [ ] Wire route handler (Pattern A) OR content negotiation (Pattern B) OR static dump (Pattern C)
  • [ ] Add Allow: /*.md$ to robots.txt (via aeo-assets)
  • [ ] List markdown URLs in sitemap-llm.xml
  • [ ] Reference markdown convention in llms.txt
  • [ ] Verify: curl -H "Accept: text/markdown" {URL} returns text/markdown
  • [ ] Verify: curl {URL}.md returns clean markdown (no HTML chrome)
  • [ ] Track impact via aeo-citation-track — measure mention rate before / 30 days after

Measurement (proves this is worth doing)

Before launching the markdown layer, capture baseline:

  • 4-engine citation rate via aeo-citation-track
  • AI bot user-agent traffic from server logs (look for GPTBot, ClaudeBot, PerplexityBot, Perplexity-User, OAI-SearchBot, ChatGPT-User)
  • AI-referred sessions in analytics (referer header https://chat.openai.com, https://perplexity.ai, https://claude.ai, https://gemini.google.com)
30 days after launch, re-measure. G2's reported gain was +300% citations + +300% context-fetch requests. Real-world results vary 50-400% depending on content quality and category competitiveness.

Common gotchas

  1. Don't double-publish thin content — if a page is < 300 unique words in HTML, the markdown is also thin. Fix the page first.
  2. Don't use markdown to publish content not on the HTML page — Google may treat the divergence as cloaking. Markdown should be a clean representation of HTML, not a longer/different version.
  3. Don't forget locale prefix/zh-CN/foo.md for international sites; pair with hreflang.
  4. Don't lose schema — frontmatter must include the schema fields the original HTML carries.