AI Skill

seo-retro

Last updated: 2026-05-17

Weekly SEO retrospective — measure experiments that hit their measure_at date, score win/loss/inconclusive, and write a 1-screen retro summary. TRIGGER every Fr

Quick Install

npx skills add seo-retro

seo-retro — Measure What You Predicted

The retro is the load-bearing phase of the learning loop. Without it, experiments are just a graveyard of untested hypotheses. With it, every week produces signal.

Rule

Run every Friday at 16:00 local. Never skip. Skipping a retro breaks the compounding loop — experiments past their measure_at become stale and unmeasurable.

Inputs

None required. Skill reads the workspace:

{workspace}/seo-config.yaml
{workspace}/reports/seo/memory/experiments/.yaml

{workspace}/reports/seo/raw/gsc-.json (and ga4, aeo-probe raw files)

Workflow

Phase 1: Find experiments due

# All experiments with status=open AND measure_at <= today
find reports/seo/memory/experiments -name "exp-.yaml" \
  -exec grep -l "status: open" {} \; \
  | xargs grep -l "measure_at: \(date +%Y\|older\)"

Sort by measure_at ascending — oldest first, so the retro always catches stale ones.

If an experiment is > 7 days past its measure_at with no measurement, flag it as a discipline failure in the retro summary. This creates back-pressure against letting experiments rot.

Phase 2: Measure each experiment

For each due experiment:

Re-read its expected.metric and expected.baseline.

Pull the current value from the appropriate data source:
- ctr_14d / clicks_30d / position_30d → GSC - traffic_90d / conversions_90d → GA4 - citation_rate_60d → latest aeo-probe result for the page - lcp_ms / inp_ms → PageSpeed / CrUX
Compute actual vs expected:
- delta_vs_expected = (actual - expected.target) / (expected.target - expected.baseline) - >= 1.0 → hit target or better - 0.3 – 1.0 → partial win (directionally right) - -0.3 – 0.3 → inconclusive (noise range) - < -0.3 → loss (moved wrong direction or got worse)

Phase 3: Score the verdict

Rules for assigning verdict:

Condition Verdict
actual hit or exceeded expected.target win

actual moved in expected direction by ≥ 50% of target delta win (partial)

actual within ±10% of baseline (noise) inconclusive

actual moved in wrong direction AND delta > 20% loss

Data source unavailable / page not yet indexed inconclusive (note why)
Edge cases:

Confounding events: If a Google algo update happened in the window, note it in result.notes and still record the verdict. Don't use confounds as an excuse to avoid measurement.

Seasonality: For pages with known seasonality (e.g., tax software in April), compare year-over-year where possible.

Phase 4: Write result back to experiment YAML

Update the experiment file in place:

status: measured # was: open
result: measured_at: 2026-04-30T16:12:00Z actual: metric: ctr_14d value: 0.034 verdict: win delta_vs_expected: 1.32 # exceeded target by 32% notes: > Clean lift. No Google algo events in window. Position held steady at 3.1 ± 0.2, so the CTR gain is attributable to the title change, not to a ranking shift.

Do NOT delete or move the experiment. Append result fields. Append only.

Phase 5: Write retro summary

Write {workspace}/reports/seo/retros/retro-{YYYY-MM-DD}.md:

# SEO Retro — Week of {start_date} – {end_date} Experiments measured this week: {N} Wins ({K}) [exp-...-a7k] /alternatives/cursor — title +8 Tools → CTR +62% (target +20%) [exp-...-b9q] /compare/x-vs-y — schema FAQPage → impressions +14% Losses ({M}) [exp-...-p2n] /blog/X — content refresh → traffic -8% (expected +15%) - Note: coincided with Google March core update Inconclusive ({L}) [exp-...-t4m] /use/gpt-5 — new page → 30d too early for ranking signal - Keeping open, remeasure at 60d mark Patterns emerging _{from seo-learn scan — only noted, not promoted yet unless ≥3 wins}_ Title numbers continue to win (4/5 recent experiments) Schema markup on /compare/ pages consistently lifts impressions Discipline check Experiments overdue > 7d: 0 ✅ Changes shipped without exp log this week: 0 ✅ (check: git log --since="1 week ago" -- 'content/' 'app//page.tsx' then diff against count of experiments with deployed_at in range) Recommended actions for next week (≤ 3) Apply add-numbers-to-title.md playbook to top 5 remaining P4-10 pages without numbers Investigate /blog/X regression — was the refresh too aggressive? Run AEO probe refresh — citation rate measured 4 weeks ago, need new baseline
Next retro: {YYYY-MM-DD}

Phase 6: Trigger follow-ups

If any loss has delta_vs_expected < -0.5 → trigger seo-postmortem on that experiment

If any pattern shows ≥ 3 wins in a category → flag for seo-learn (runs monthly, but note the candidate)

If discipline check fails (any experiments overdue OR changes without logs) → note it in owner's weekly report

Quality bar

Retro is done only when:

[ ] Every open experiment with measure_at <= today has a verdict assigned

[ ] Retro markdown exists at retros/retro-{date}.md

[ ] Each win/loss/inconclusive entry cites its experiment ID (linkable)

[ ] Discipline check ran

[ ] Recommended actions ≤ 3 (more = unfocused)

[ ] If a loss had Δ < -0.5, postmortem is triggered (not just noted)

What I refuse

To mark an experiment "win" because the owner wants a win. The measurement is the measurement.

To retroactively change expected.target to make a loss look like a win.

To delete experiments that came up short. Losses are the training data for lessons.

To skip a week because "nothing interesting happened". The discipline check IS the interesting thing.

To bundle "probably a win" verdicts. If I'm not sure, it's inconclusive.

Integration

Called by:

seo-rhythm-weekly (every Friday)

Ad-hoc when owner says "run the retro" or "measure last week's experiments"

Reads:

experiments/.yaml where status=open
Latest data files in reports/seo/raw/
seo-config.yaml for cadence overrides

Writes:

Updates each measured experiment file (appends result block)
Creates retros/retro-{date}.md

Triggers:

seo-postmortem on severe losses
seo-learn eligibility flag on patterns with ≥ 3 wins

Condition	Verdict
actual hit or exceeded expected.target	win
actual moved in expected direction by ≥ 50% of target delta	win (partial)
actual within ±10% of baseline (noise)	inconclusive
actual moved in wrong direction AND delta > 20%	loss
Data source unavailable / page not yet indexed	inconclusive (note why)