aeo-citation-track
Track brand citations across 4 AI engines (ChatGPT, Claude, Perplexity, Gemini). Output share-of-voice JSON for any brand vs competitors. Triggers — 'AEO citati
npx skills add aeo-citation-track
Skill: AEO Citation Tracker — 4-Engine Brand Visibility
Universal brand-citation tracker. Given a brand and its competitors, query 4 major AI engines on a curated query set, record who got cited, and output share-of-voice over time.
When to invoke
- Weekly cadence (Wednesday) for any brand doing AEO seriously
- Before launching: baseline measurement
- After major content/PR push: did the campaign move citations?
- When user says "is ChatGPT recommending us?", "Perplexity never mentions us", "why is competitor X always cited first?"
Inputs
| Input | Example | Required |
|---|---|---|
BRAND | "Acme Corp" | yes |
COMPETITORS | ["BrandX", "BrandY", "BrandZ"] | yes (3-7 ideal) |
CATEGORY | "AI design tool" | yes — used to write queries |
QUERIES (optional) | custom queries to test | optional — auto-generate if absent |
OUTPUT_DIR | "reports/aeo/" | yes |
Query design (if not user-supplied)
Generate 12-20 queries across 4 buckets. Each query is the kind of long-tail prompt (50-80 words) AI engines actually receive — NOT short Google keywords.
| Bucket | Query shape | Example |
|---|---|---|
| Direct (3-4) | "What is {BRAND}?", "{BRAND} pricing", "How does {BRAND} work?" | Tests if engines know the brand at all. |
| Category (4-6) | "Best {CATEGORY} 2026", "Top {CATEGORY} for {AUDIENCE}", "Cheapest {CATEGORY}" | Tests if brand surfaces unprompted. |
| Comparison (3-5) | "{BRAND} vs {COMP}", "{COMP} alternatives", "Should I use {BRAND} or {COMP}?" | Tests positioning. |
| Use case (3-5) | "How do I {VERB} {OBJECT} using AI?", "{JOB_TO_BE_DONE} with AI" | Tests fan-out: AI decomposes user intent into sub-queries. |
geo-query-finder skill is available and user has more than 5 minutes, run it first to source higher-quality long-tail queries grounded in real search intent.
Execution: query 4 engines
For each query, ask each engine and record:
| Engine | How to query | Auth |
|---|---|---|
| ChatGPT (web) | mcp__skillboss__chat with model openai/gpt-4o-search-preview OR openai/gpt-4-turbo with web tools | gateway key |
| Claude (web) | mcp__skillboss__chat with model anthropic/claude-3-7-sonnet web-tools enabled | gateway key |
| Perplexity | mcp__skillboss__chat with model perplexity/sonar or perplexity/sonar-pro | gateway key |
| Gemini | mcp__skillboss__chat with model google/gemini-2.5-pro-search | gateway key |
WebFetch against the engine's public web UI (slower, lower fidelity), or have the user paste responses into a structured form.
Per-result extraction
For each (engine, query) response, parse and record:
{
"engine": "chatgpt|claude|perplexity|gemini",
"query": "<verbatim query>",
"timestamp": "<ISO 8601>",
"brand_mentioned": true|false,
"brand_rank": 1|2|3|null, // null if not mentioned; 1 = first recommendation
"brand_context": "<30-100 word snippet showing how it was mentioned, verbatim>",
"competitors_mentioned": [
{ "name": "BrandX", "rank": 1 },
{ "name": "BrandY", "rank": 2 }
],
"citations": [ // if engine returns source URLs
{ "url": "...", "title": "...", "is_brand_owned": true|false }
],
"response_full": "<full response text, for audit>"
}
Aggregation: share-of-voice
After all (engine, query) runs, compute:
{
"date": "YYYY-MM-DD",
"brand": "Acme Corp",
"total_queries": 16,
"total_runs": 64, // 16 queries × 4 engines
"brand_mention_rate": 0.42, // 27/64
"brand_avg_rank_when_mentioned": 2.3,
"by_engine": {
"chatgpt": { "mention_rate": 0.56, "avg_rank": 1.8 },
"claude": { "mention_rate": 0.50, "avg_rank": 2.1 },
"perplexity": { "mention_rate": 0.31, "avg_rank": 2.9 },
"gemini": { "mention_rate": 0.31, "avg_rank": 2.5 }
},
"by_bucket": {
"direct": { "mention_rate": 1.00, "avg_rank": 1.0 },
"category": { "mention_rate": 0.40, "avg_rank": 2.4 },
"comparison": { "mention_rate": 0.45, "avg_rank": 2.2 },
"use_case": { "mention_rate": 0.20, "avg_rank": 3.1 }
},
"share_of_voice_vs_competitors": [
{ "name": "Acme Corp", "mention_rate": 0.42 },
{ "name": "BrandX", "mention_rate": 0.78 },
{ "name": "BrandY", "mention_rate": 0.55 },
{ "name": "BrandZ", "mention_rate": 0.31 }
],
"top_citing_sources": [
{ "url": "wikipedia.org/...", "count": 8 },
{ "url": "reddit.com/r/...", "count": 5 },
{ "url": "youtube.com/...", "count": 4 }
]
}
Save to {OUTPUT_DIR}/citations-{YYYY-MM-DD}.json AND append a 1-row summary to {OUTPUT_DIR}/citation-history.json.
Output report (markdown, for humans)
# AEO Citation Report — {YYYY-MM-DD}
TL;DR
{Brand} mention rate: {X}% (was {Y}% last week, Δ {±Z}%)
Strongest engine: {engine} ({rate}%)
Weakest engine: {engine} ({rate}%)
vs Competitors (share of voice)
| Brand | Mention rate | Avg rank when cited |
| --- | --- | --- |
What changed since last run
{Top 3 deltas: queries that gained/lost the brand}
Action items
- {What to fix — e.g. "Use case bucket weakest. Need 2 how-to articles answering '{query}' and '{query}'."}
- {What to publicize — e.g. "Brand cited #1 in Perplexity for '{query}'. Reuse that snippet as social copy."}
- {Off-page move — e.g. "BrandX cited 5x via Reddit. We have 0 Reddit threads. Casey/Jordan: post answer in r/{X}."}
Citation source intelligence (key insight)
Don't just track WHO is cited — track WHAT URLs the engines pull from. The hidden lever in AEO is being cited BY the URLs that AI engines already trust. If the report shows "Wikipedia + Reddit + YouTube + 3 industry blogs" as top citation sources for the category, the off-page strategy writes itself: get mentioned on those exact URLs.
OUTPUT CONTRACT (NO INTERNALS)
The output JSON and markdown are operational, not public, but treat them as if they could leak:
- Don't paste internal API keys or auth headers in
response_full - Don't include the verbatim system prompt sent to the engine — only the user query
- Snippet length capped at 200 chars per
brand_context(anti-quotation-of-paywalled-content)
Common gotchas
- Engines are stochastic — running the same query twice gets different responses. Run each query 1× per session, but compare week-over-week trends, not point-in-time numbers.
- "Web search" mode matters — Claude/ChatGPT without web tools answer from training data (stale). Always enable web/search modes.
- Engine refusal ("I can't recommend specific brands") — count as
brand_mentioned: false, but flag in report; high refusal rate on competitive queries means the prompt is too commercial. - Brand name collisions — if
BRAND = "Acme"and there's a famous "Acme Inc" in another industry, false positives explode. Either qualify ("Acme Corp"everywhere) or use disambiguating context match (brand_contextmust contain {CATEGORY} keyword). - First-run baseline — week 1 is data, not signal. Wait until week 4 before drawing conclusions about trends.