aeo-markdown-render
Serve a markdown version of every page for LLM crawlers. The +300% AI citation trick (G2 experiment, AEO Conf 2026). Triggers — 'markdown for LLMs', 'AI crawler
npx skills add aeo-markdown-render
Skill: AEO Markdown Rendering
Serve a parallel markdown version of every public HTML page so LLM crawlers can ingest content without parsing HTML, JavaScript, or CSS noise.
Why this works (the +300% number)
G2 ran this experiment in 2025 and reported a +300% increase in AI citations and AI context-fetch requests after publishing markdown versions of their pages. Same content, basic markdown rendering, no rich text required — the gain came purely from making content easier for LLM crawlers to parse.
Mechanism: when an LLM engine (Claude, ChatGPT, Perplexity) needs to deeply read a page (vs. snippet from search), it issues a context-fetch request. Markdown is ~5-10× smaller and 100× cleaner than the corresponding HTML. Crawlers prefer it; some crawlers refuse to deeply ingest pages above a certain HTML/JS payload threshold.
Approach: 3 viable patterns
Pick whichever fits the stack. They're equivalent for AEO purposes.
Pattern A: .md route alongside .html
For every page at /foo/bar, also serve /foo/bar.md (or /foo/bar/index.md). LLM crawlers learn the convention and prefer the .md URL.
- Next.js: dynamic route
app/[locale]/[...slug]/route.tsreturnstext/markdownbased onAcceptheader OR.mdextension in path. - Static site / Hugo / Jekyll: emit
.mdfiles alongside.htmlat build time. - Custom CMS: middleware that intercepts
.mdrequests, looks up the page, and renders.
Pattern B: Content negotiation on the same URL
Same URL /foo/bar returns either HTML or markdown depending on Accept: text/markdown request header. Cleaner URL space, but some crawlers don't send the header.
Pattern C: Single dump in llms-full.txt
If the site is small (< 500 pages), skip per-page markdown and instead generate one consolidated llms-full.txt (see aeo-assets skill). Less granular but lower engineering cost.
Markdown generation rules
Markdown should be:
- Answer-first: First 40-60 words must directly answer the page's primary question. If the HTML lede is fluff, rewrite for the markdown.
- Schema preserved as YAML frontmatter:
- Headings preserved exactly — H1 / H2 / H3 must match the rendered HTML. AI engines use heading hierarchy to index sub-questions.
- Tables preserved as markdown tables — don't convert to text. AI parses markdown tables as structured data.
- Code blocks preserved with language tags — improves citation in technical queries.
- No JS/CSS/navigation chrome — strip nav, footer, sidebars, cookie banners, modals. Just main content.
- Internal links preserved —
text→ AI follows links to build site context. - Images: alt text only —
![alt text]()(omit src). LLMs only need the alt for context. - No duplicate content — if HTML page is mostly programmatically generated boilerplate, the markdown will too. Make sure each page has unique signal (≥ 300 unique words).
---
title: "Best AI Design Tools 2026"
description: "Six tools compared by price, output quality, and use case."
url: "https://acme.com/best-ai-design-tools"
updated: "2026-04-26"
schema_type: "Article"
author: "Jane Doe"
---
Quality template
---
title: "{H1 from HTML}"
description: "{meta description}"
url: "{canonical URL}"
updated: "{YYYY-MM-DD from page Last-Modified or schema}"
schema_type: "{Article|Product|FAQPage|HowTo|...}"
{H1}
{Answer-first paragraph, 40-60 words, fact-anchored, citation-shaped.
First sentence directly answers the title's implied question.}
{H2 — first major section}
{Content. Preserve tables, lists, bullet structure.}
Header Header Cell Cell
{H2 — second section}
...
FAQ
{Question 1}
{30-80 word answer.}
{Question 2}
{...}
Last updated: {YYYY-MM-DD} • Read this on the web: {URL}
Robots.txt + sitemap glue
Wire the markdown into the AEO discovery surface:
- Add to
robots.txt: - Add to
sitemap-llm.xml: list both/fooAND/foo.mdfor high-value pages, OR list only.mdto nudge crawlers. - Cross-reference in
llms.txt:
Allow: /.md$
## Markdown versions for LLMs
Every page on this site is also available in markdown by appending .md:
- {CANONICAL_URL}/about → {CANONICAL_URL}/about.md
- {CANONICAL_URL}/pricing → {CANONICAL_URL}/pricing.md
Cache & headers
For each .md response:
Content-Type: text/markdown; charset=utf-8
Cache-Control: public, max-age=3600, stale-while-revalidate=86400
Last-Modified: {date}
X-Robots-Tag: index, follow
Link: <{html-canonical-url}>; rel="canonical"
The Link: rel=canonical header tells LLM engines that the markdown is a representation of the HTML page, not a duplicate. Critical: without it, you risk getting both versions de-duplicated by Google's canonical chooser.
Implementation checklist
- [ ] Build a markdown renderer (page →
.mdstring) - [ ] Wire route handler (Pattern A) OR content negotiation (Pattern B) OR static dump (Pattern C)
- [ ] Add
Allow: /*.md$to robots.txt (viaaeo-assets) - [ ] List markdown URLs in sitemap-llm.xml
- [ ] Reference markdown convention in llms.txt
- [ ] Verify:
curl -H "Accept: text/markdown" {URL}returnstext/markdown - [ ] Verify:
curl {URL}.mdreturns clean markdown (no HTML chrome) - [ ] Track impact via
aeo-citation-track— measure mention rate before / 30 days after
Measurement (proves this is worth doing)
Before launching the markdown layer, capture baseline:
- 4-engine citation rate via
aeo-citation-track - AI bot user-agent traffic from server logs (look for
GPTBot,ClaudeBot,PerplexityBot,Perplexity-User,OAI-SearchBot,ChatGPT-User) - AI-referred sessions in analytics (referer header
https://chat.openai.com,https://perplexity.ai,https://claude.ai,https://gemini.google.com)
Common gotchas
- Don't double-publish thin content — if a page is < 300 unique words in HTML, the markdown is also thin. Fix the page first.
- Don't use markdown to publish content not on the HTML page — Google may treat the divergence as cloaking. Markdown should be a clean representation of HTML, not a longer/different version.
- Don't forget locale prefix —
/zh-CN/foo.mdfor international sites; pair with hreflang. - Don't lose schema — frontmatter must include the schema fields the original HTML carries.