AI Skill

seo-experiment-log

Last updated: 2026-05-17

Log a SEO/AEO experiment BEFORE it ships — hypothesis, change, expected outcome, and measurement date. TRIGGER this before every meaningful SEO change (title re

Quick Install

npx skills add seo-experiment-log

seo-experiment-log — Log Before You Ship

One rule: every meaningful SEO/AEO change creates an experiment entry BEFORE deployment.

An "experiment" is not some heavy A/B-test framework. It's a YAML file with four fields: hypothesis, change, expected outcome, measurement date. That's it. But the discipline of writing those four fields is what separates an agent that learns from an agent that fiddles.

Inputs

Either invoked programmatically by seo-optimize / seo-create / seo-rhythm-daily, or directly by me when I decide to ship a change.

Required:

page — URL path being changed
change_type — one of: title | desc | content | schema | internal_links | new_page | canonical | redirect | other
hypothesis — 1–3 sentences, the "why I think this will work"
expected.metric — what I'll measure (ctr_14d | clicks_30d | position_30d | citation_rate_60d | traffic_90d | conversions_90d)
expected.baseline — current value of that metric
expected.target — what I expect it to become
measure_at — ISO date when retro will measure (defaults per change_type — see below)

Optional:

playbook_consulted — filename of playbook that informed this decision, or "none" with reason
notes — context the retro-reader will need

Default `measure_at` by change type

These defaults are based on typical signal-detection windows — use them unless you have a reason to deviate:

change_type	default window	rationale
title	+14 days	CTR signal stabilizes in 1–2 weeks
desc	+14 days	same as title
content	+60 days	Google needs re-crawl + re-evaluation
schema	+30 days	rich result rollouts
internal_links	+45 days	PageRank redistribution
new_page	+90 days	indexation + ranking maturation
canonical / redirect	+30 days	de-indexing + re-indexing cycle
aeo (llms.txt / markdown)	+60 days	AI training/retrieval cycle

Output

Write to {workspace}/reports/seo/memory/experiments/exp-{YYYY-MM-DD}-{short_id}.yaml.

short_id = 3 random lowercase chars for uniqueness within a day.

Template:

id: exp-{YYYY-MM-DD}-{short_id}
created: {ISO8601_timestamp}
status: open

hypothesis: >
  {1-3 sentence "why"}

change:
  type: {change_type}
  page: {path}
  before: {current_state}    # title string, before-text, schema type, etc
  after: {new_state}
  deployed_at: {ISO8601 | null}   # null if still pre-deploy
  deployed_sha: {git_sha | null}

expected:
  metric: {metric_name}
  baseline: {current_value}
  target: {expected_value}
  direction: {increase | decrease}

measure_at: {YYYY-MM-DD}
source: {gsc | ga4 | lighthouse | aeo_probe | manual}

playbook_consulted: {filename | "none: {reason}"}

notes: {free_text | null}

result:
  measured_at: null
  actual:
    metric: {same as expected.metric}
    value: null
  verdict: null           # filled by seo-retro: win | loss | inconclusive
  delta_vs_expected: null
  notes: null

Hypothesis quality bar

A weak hypothesis breaks the entire learning loop. Reject and rewrite if the hypothesis:

is just a restatement of the change ("I will add a number to the title because I want to add a number to the title") — this is useless
uses vague verbs ("should improve", "might help") — say what metric, by how much, why
has no mechanism ("Google likes this") — cite the specific mechanism (CTR pattern, relevance signal, citation likelihood)
can't fail ("this might work or might not") — a hypothesis that can't fail can't teach

Good hypothesis example:

Adding "8 Tools" to the /alternatives/cursor title will raise CTR by ≥20% within 14 days. The page is currently at position 3.2 with CTR 2.1% (benchmark for P3 is ~10%, so the page is under-performing CTR by ~50%). Numbers in titles signal specificity, and three recent experiments (exp-2026-03-29-t4m, exp-2026-03-12-b9q, exp-2026-02-18-p2n) saw +18–28% CTR on similar pages.

This is good because it has: concrete metric, specific target, current value, benchmark, named mechanism, and cites prior evidence. Retro can cleanly measure win/loss.

Anti-patterns (refuse these)

Bundling multiple changes in one experiment. If I change title AND desc AND schema at once, I can't attribute the outcome. Split into separate experiments OR explicitly accept that the experiment measures the bundle (and name it "bundle-test-2026-04-16").
Measuring "whenever I remember". measure_at is a commitment. Retro measures on that day.
Cancelling after seeing early data. Don't peek at day 3 and decide to kill the experiment. Wait for measure_at.
Rewriting the hypothesis after the fact. If the hypothesis needs to change, log a new experiment with a new ID that cites the old one.

Integration

Called by:

seo-optimize before any CTR/desc/content/schema change
seo-create after a new page is deployed (with 90d window)
seo-rhythm-daily during the 10:07 CTR sprint
Directly by the agent for ad-hoc changes

Reads:

{workspace}/reports/seo/memory/playbooks/ — to find the relevant playbook
{workspace}/reports/seo/memory/lessons/ — to verify this isn't re-running a known bad pattern

Writes:

{workspace}/reports/seo/memory/experiments/exp-*.yaml

Handoff

After writing the experiment file, return to caller:

✅ exp-2026-04-16-a7k logged
   Hypothesis: +20% CTR via title numbers
   Measure on: 2026-04-30 (14d window)
   Playbook: add-numbers-to-title.md

The change can now ship. Retro will close the loop automatically.