seo-experiment-log
Log a SEO/AEO experiment BEFORE it ships — hypothesis, change, expected outcome, and measurement date. TRIGGER this before every meaningful SEO change (title re
npx skills add seo-experiment-log
seo-experiment-log — Log Before You Ship
One rule: every meaningful SEO/AEO change creates an experiment entry BEFORE deployment.An "experiment" is not some heavy A/B-test framework. It's a YAML file with four fields: hypothesis, change, expected outcome, measurement date. That's it. But the discipline of writing those four fields is what separates an agent that learns from an agent that fiddles.
Inputs
Either invoked programmatically by seo-optimize / seo-create / seo-rhythm-daily, or directly by me when I decide to ship a change.
Required:
page— URL path being changedchange_type— one of: title | desc | content | schema | internal_links | new_page | canonical | redirect | otherhypothesis— 1–3 sentences, the "why I think this will work"expected.metric— what I'll measure (ctr_14d | clicks_30d | position_30d | citation_rate_60d | traffic_90d | conversions_90d)expected.baseline— current value of that metricexpected.target— what I expect it to becomemeasure_at— ISO date when retro will measure (defaults per change_type — see below)
playbook_consulted— filename of playbook that informed this decision, or"none"with reasonnotes— context the retro-reader will need
Default measure_at by change type
These defaults are based on typical signal-detection windows — use them unless you have a reason to deviate:
| change_type | default window | rationale |
|---|---|---|
| title | +14 days | CTR signal stabilizes in 1–2 weeks |
| desc | +14 days | same as title |
| content | +60 days | Google needs re-crawl + re-evaluation |
| schema | +30 days | rich result rollouts |
| internal_links | +45 days | PageRank redistribution |
| new_page | +90 days | indexation + ranking maturation |
| canonical / redirect | +30 days | de-indexing + re-indexing cycle |
| aeo (llms.txt / markdown) | +60 days | AI training/retrieval cycle |
Output
Write to {workspace}/reports/seo/memory/experiments/exp-{YYYY-MM-DD}-{short_id}.yaml.
short_id = 3 random lowercase chars for uniqueness within a day.
Template:
id: exp-{YYYY-MM-DD}-{short_id}
created: {ISO8601_timestamp}
status: open
hypothesis: >
{1-3 sentence "why"}
change:
type: {change_type}
page: {path}
before: {current_state} # title string, before-text, schema type, etc
after: {new_state}
deployed_at: {ISO8601 | null} # null if still pre-deploy
deployed_sha: {git_sha | null}
expected:
metric: {metric_name}
baseline: {current_value}
target: {expected_value}
direction: {increase | decrease}
measure_at: {YYYY-MM-DD}
source: {gsc | ga4 | lighthouse | aeo_probe | manual}
playbook_consulted: {filename | "none: {reason}"}
notes: {free_text | null}
result:
measured_at: null
actual:
metric: {same as expected.metric}
value: null
verdict: null # filled by seo-retro: win | loss | inconclusive
delta_vs_expected: null
notes: null
Hypothesis quality bar
A weak hypothesis breaks the entire learning loop. Reject and rewrite if the hypothesis:
- is just a restatement of the change ("I will add a number to the title because I want to add a number to the title") — this is useless
- uses vague verbs ("should improve", "might help") — say what metric, by how much, why
- has no mechanism ("Google likes this") — cite the specific mechanism (CTR pattern, relevance signal, citation likelihood)
- can't fail ("this might work or might not") — a hypothesis that can't fail can't teach
Adding "8 Tools" to the /alternatives/cursor title will raise CTR by ≥20% within 14 days. The page is currently at position 3.2 with CTR 2.1% (benchmark for P3 is ~10%, so the page is under-performing CTR by ~50%). Numbers in titles signal specificity, and three recent experiments (exp-2026-03-29-t4m, exp-2026-03-12-b9q, exp-2026-02-18-p2n) saw +18–28% CTR on similar pages.
This is good because it has: concrete metric, specific target, current value, benchmark, named mechanism, and cites prior evidence. Retro can cleanly measure win/loss.
Anti-patterns (refuse these)
- Bundling multiple changes in one experiment. If I change title AND desc AND schema at once, I can't attribute the outcome. Split into separate experiments OR explicitly accept that the experiment measures the bundle (and name it "bundle-test-2026-04-16").
- Measuring "whenever I remember".
measure_atis a commitment. Retro measures on that day. - Cancelling after seeing early data. Don't peek at day 3 and decide to kill the experiment. Wait for
measure_at. - Rewriting the hypothesis after the fact. If the hypothesis needs to change, log a new experiment with a new ID that cites the old one.
Integration
Called by:
seo-optimizebefore any CTR/desc/content/schema changeseo-createafter a new page is deployed (with 90d window)seo-rhythm-dailyduring the 10:07 CTR sprint- Directly by the agent for ad-hoc changes
{workspace}/reports/seo/memory/playbooks/— to find the relevant playbook{workspace}/reports/seo/memory/lessons/— to verify this isn't re-running a known bad pattern
{workspace}/reports/seo/memory/experiments/exp-*.yaml
Handoff
After writing the experiment file, return to caller:
✅ exp-2026-04-16-a7k logged
Hypothesis: +20% CTR via title numbers
Measure on: 2026-04-30 (14d window)
Playbook: add-numbers-to-title.md
The change can now ship. Retro will close the loop automatically.