← All Skills
AI Skill

seo-experiment-log

Last updated: 2026-05-17

Log a SEO/AEO experiment BEFORE it ships — hypothesis, change, expected outcome, and measurement date. TRIGGER this before every meaningful SEO change (title re

Quick Install
npx skills add seo-experiment-log

seo-experiment-log — Log Before You Ship

One rule: every meaningful SEO/AEO change creates an experiment entry BEFORE deployment.

An "experiment" is not some heavy A/B-test framework. It's a YAML file with four fields: hypothesis, change, expected outcome, measurement date. That's it. But the discipline of writing those four fields is what separates an agent that learns from an agent that fiddles.

Inputs

Either invoked programmatically by seo-optimize / seo-create / seo-rhythm-daily, or directly by me when I decide to ship a change.

Required:

  • page — URL path being changed
  • change_type — one of: title | desc | content | schema | internal_links | new_page | canonical | redirect | other
  • hypothesis — 1–3 sentences, the "why I think this will work"
  • expected.metric — what I'll measure (ctr_14d | clicks_30d | position_30d | citation_rate_60d | traffic_90d | conversions_90d)
  • expected.baseline — current value of that metric
  • expected.target — what I expect it to become
  • measure_at — ISO date when retro will measure (defaults per change_type — see below)
Optional:
  • playbook_consulted — filename of playbook that informed this decision, or "none" with reason
  • notes — context the retro-reader will need

Default measure_at by change type

These defaults are based on typical signal-detection windows — use them unless you have a reason to deviate:

change_typedefault windowrationale
title+14 daysCTR signal stabilizes in 1–2 weeks
desc+14 dayssame as title
content+60 daysGoogle needs re-crawl + re-evaluation
schema+30 daysrich result rollouts
internal_links+45 daysPageRank redistribution
new_page+90 daysindexation + ranking maturation
canonical / redirect+30 daysde-indexing + re-indexing cycle
aeo (llms.txt / markdown)+60 daysAI training/retrieval cycle

Output

Write to {workspace}/reports/seo/memory/experiments/exp-{YYYY-MM-DD}-{short_id}.yaml.

short_id = 3 random lowercase chars for uniqueness within a day.

Template:

id: exp-{YYYY-MM-DD}-{short_id}
created: {ISO8601_timestamp}
status: open

hypothesis: > {1-3 sentence "why"}

change: type: {change_type} page: {path} before: {current_state} # title string, before-text, schema type, etc after: {new_state} deployed_at: {ISO8601 | null} # null if still pre-deploy deployed_sha: {git_sha | null}

expected: metric: {metric_name} baseline: {current_value} target: {expected_value} direction: {increase | decrease}

measure_at: {YYYY-MM-DD} source: {gsc | ga4 | lighthouse | aeo_probe | manual}

playbook_consulted: {filename | "none: {reason}"}

notes: {free_text | null}

result: measured_at: null actual: metric: {same as expected.metric} value: null verdict: null # filled by seo-retro: win | loss | inconclusive delta_vs_expected: null notes: null

Hypothesis quality bar

A weak hypothesis breaks the entire learning loop. Reject and rewrite if the hypothesis:

  • is just a restatement of the change ("I will add a number to the title because I want to add a number to the title") — this is useless
  • uses vague verbs ("should improve", "might help") — say what metric, by how much, why
  • has no mechanism ("Google likes this") — cite the specific mechanism (CTR pattern, relevance signal, citation likelihood)
  • can't fail ("this might work or might not") — a hypothesis that can't fail can't teach
Good hypothesis example:
Adding "8 Tools" to the /alternatives/cursor title will raise CTR by ≥20% within 14 days. The page is currently at position 3.2 with CTR 2.1% (benchmark for P3 is ~10%, so the page is under-performing CTR by ~50%). Numbers in titles signal specificity, and three recent experiments (exp-2026-03-29-t4m, exp-2026-03-12-b9q, exp-2026-02-18-p2n) saw +18–28% CTR on similar pages.

This is good because it has: concrete metric, specific target, current value, benchmark, named mechanism, and cites prior evidence. Retro can cleanly measure win/loss.

Anti-patterns (refuse these)

  1. Bundling multiple changes in one experiment. If I change title AND desc AND schema at once, I can't attribute the outcome. Split into separate experiments OR explicitly accept that the experiment measures the bundle (and name it "bundle-test-2026-04-16").
  2. Measuring "whenever I remember". measure_at is a commitment. Retro measures on that day.
  3. Cancelling after seeing early data. Don't peek at day 3 and decide to kill the experiment. Wait for measure_at.
  4. Rewriting the hypothesis after the fact. If the hypothesis needs to change, log a new experiment with a new ID that cites the old one.

Integration

Called by:

  • seo-optimize before any CTR/desc/content/schema change
  • seo-create after a new page is deployed (with 90d window)
  • seo-rhythm-daily during the 10:07 CTR sprint
  • Directly by the agent for ad-hoc changes
Reads:
  • {workspace}/reports/seo/memory/playbooks/ — to find the relevant playbook
  • {workspace}/reports/seo/memory/lessons/ — to verify this isn't re-running a known bad pattern
Writes:
  • {workspace}/reports/seo/memory/experiments/exp-*.yaml

Handoff

After writing the experiment file, return to caller:

✅ exp-2026-04-16-a7k logged
   Hypothesis: +20% CTR via title numbers
   Measure on: 2026-04-30 (14d window)
   Playbook: add-numbers-to-title.md

The change can now ship. Retro will close the loop automatically.