Technical SEO Audit Skill
> Purpose: Prevent SEO disasters like the One Person Company incident (2026-01-27) > > Reusable for: Any static website deployment (Cloudflare Pages, Vercel, Ne
npx skills add technical-seo-audit
Technical SEO Audit Skill
Purpose: Prevent SEO disasters like the One Person Company incident (2026-01-27)>
Reusable for: Any static website deployment (Cloudflare Pages, Vercel, Netlify, etc.)
✅ Pre-Deployment Checklist (MANDATORY)
Before deploying ANY website, verify these 4 things:
1️⃣ Create _headers File (CRITICAL)
Location: deploy/_headers (or public/_headers depending on your build)
Content:
# Cloudflare Pages Headers Configuration
Allow all pages to be indexed by search engines
/
X-Robots-Tag: index, follow
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin
Cache-Control: public, max-age=3600
Static assets - longer cache
/images/
Cache-Control: public, max-age=31536000, immutable
Sitemap and robots - short cache for updates
/sitemap.xml
X-Robots-Tag: noindex
Cache-Control: public, max-age=3600
/robots.txt
X-Robots-Tag: noindex
Cache-Control: public, max-age=3600
Why noindex for sitemap/robots?
- These are tool files, not content
- Google doesn't need to index them
- Reduces crawl waste
2️⃣ Generate Complete Sitemap
Requirements:- All HTML pages must be in sitemap
- Use production domain (NOT
.pages.devorlocalhost) - Valid XML format
- Include
tags
// scripts/generate_sitemap.js
const fs = require('fs');
const path = require('path');
const DOMAIN = 'https://your-domain.com';
const DEPLOY_DIR = './deploy';
const htmlFiles = fs.readdirSync(DEPLOY_DIR)
.filter(f => f.endsWith('.html'));
const sitemap = <?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>${DOMAIN}/</loc>
<lastmod>${new Date().toISOString()}</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
${htmlFiles.map(file => <url>
<loc>${DOMAIN}/${file}</loc>
<lastmod>${new Date().toISOString()}</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>).join('\n')}
</urlset>;
fs.writeFileSync(path.join(DEPLOY_DIR, 'sitemap.xml'), sitemap);
console.log(✅ Generated sitemap with ${htmlFiles.length + 1} URLs);
3️⃣ Configure robots.txt
Location:deploy/robots.txt
Content:
User-agent:
Allow: /
Sitemap
Sitemap: https://your-domain.com/sitemap.xml
AI Agents (optional)
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: Claude-Web
Allow: /
DO NOT:
# ❌ WRONG - This blocks everything
User-agent:
Disallow: /
4️⃣ Validate Before Deploying
Run this script EVERY TIME before deploying:#!/bin/bash
Pre-deployment validation
echo "🔍 SEO Validation Check..."
1. Check _headers exists
if [ ! -f "deploy/_headers" ]; then
echo "❌ CRITICAL: deploy/_headers missing!"
echo " Google won't index your site!"
exit 1
fi
2. Check _headers content
if ! grep -q "X-Robots-Tag: index, follow" deploy/_headers; then
echo "❌ CRITICAL: _headers doesn't allow indexing!"
exit 1
fi
3. Check sitemap exists
if [ ! -f "deploy/sitemap.xml" ]; then
echo "❌ CRITICAL: sitemap.xml missing!"
exit 1
fi
4. Count URLs in sitemap
SITEMAP_URLS=$(grep -c "<loc>" deploy/sitemap.xml)
HTML_FILES=$(ls deploy/.html 2>/dev/null | wc -l | tr -d ' ')
if [ "$SITEMAP_URLS" -lt "$HTML_FILES" ]; then
echo "❌ CRITICAL: Sitemap incomplete!"
echo " Sitemap: $SITEMAP_URLS URLs"
echo " HTML files: $HTML_FILES files"
exit 1
fi
echo "✅ All SEO checks passed!"
exit 0
🔍 Post-Deployment Verification
After deploying, immediately verify:
Check HTTP Headers
curl -I https://your-domain.com/
✅ Should see:
x-robots-tag: index, follow
❌ If you see:
x-robots-tag: noindex
→ Your _headers file wasn't deployed correctly
Check Sitemap Accessibility
curl -s https://your-domain.com/sitemap.xml | head -20
Should show valid XML with production domain
Check robots.txt
curl https://your-domain.com/robots.txt
Should allow crawling and reference sitemap
📊 Google Indexing Timeline
After fixing SEO issues:
| Timeline | Expected Result |
|---|---|
| 24-48 hours | Google starts crawling |
| 3-7 days | 30-50% pages indexed |
| 2-4 weeks | 80-100% pages indexed |
| 1-3 months | Rankings stabilize |
🚫 Common Mistakes (Learned from One Person Company)
❌ Mistake #1: No _headers File
Problem: Cloudflare addsX-Robots-Tag: noindex by default
Result: 0% of site indexed
Fix: Create deploy/_headers with X-Robots-Tag: index, follow
❌ Mistake #2: Incomplete Sitemap
Problem: Sitemap has 50 URLs, but site has 208 pages Result: Google only knows about 24% of content Fix: Regenerate sitemap automatically before each deployment❌ Mistake #3: Wrong Domain in Sitemap
Problem: Sitemap uses.pages.dev instead of custom domain
Result: Google crawls wrong URLs
Fix: Use production domain in sitemap generation script
❌ Mistake #4: Manual Sitemap Updates
Problem: Forget to update sitemap when adding new pages Result: New content never gets indexed Fix: Automate sitemap generation in build process🛠️ Integration with Build Process
For Cloudflare Pages
Add to package.json:
{
"scripts": {
"build": "npm run generate-content && npm run generate-sitemap && npm run validate-seo",
"generate-sitemap": "node scripts/generate_sitemap.js",
"validate-seo": "./scripts/pre_deploy_validation.sh"
}
}
For Vercel/Netlify
Same approach - run validation before build:
{
"scripts": {
"build": "npm run validate-seo && next build",
"validate-seo": "./scripts/pre_deploy_validation.sh"
}
}
🔄 Automation with Cron Jobs
See /Users/xiaoyinqu/heyboss/heyboss-cron-service for automated deployment system.
- Automatic daily deployments
- Pre-deployment validation
- Lark notifications for results
- No human errors
📝 For Future Websites
When starting a new website project:
- ✅ Copy
deploy/_headerstemplate - ✅ Copy
scripts/generate_sitemap.js - ✅ Copy
scripts/pre_deploy_validation.sh - ✅ Add validation to build process
- ✅ Test with
curl -I https://new-domain.com/
_headers(withX-Robots-Tag: index, follow)sitemap.xml(complete, production domain)robots.txt(allows crawling)index.html(links to all content)
🆘 Troubleshooting
Problem: Google not indexing after 2 weeks
Check:curl -I https://your-domain.com/→ Should showx-robots-tag: index, follow- Google Search Console → Coverage report → Check for errors
site:your-domain.comin Google → How many pages indexed?
- Request indexing manually in Google Search Console
- Check for manual actions (penalties)
- Verify domain ownership
- Check server logs for Googlebot access
Problem: Some pages indexed, others not
Likely causes:- Sitemap incomplete → Regenerate
- No internal links to pages → Add to index/navigation
noindexmeta tags on specific pages → Remove- 404 errors → Fix broken links
📚 Related Documentation
- One Person Company SKILL.md - Full deployment workflow
- SEO_ROOT_CAUSE_ANALYSIS.md - Detailed postmortem of indexing failure
- DEPLOYMENT_CHECKLIST.md - Step-by-step deployment guide
- pre_deploy_validation.sh - Automated validation script
🚀 Quick Start Commands
For a NEW website project:
# 1. Create project structure
mkdir -p deploy/images scripts automation
2. Copy SEO protection files from One Person Company
cp ~/path/to/onepersoncompany/deploy/_headers deploy/
cp ~/path/to/onepersoncompany/scripts/pre_deploy_validation.sh scripts/
cp ~/path/to/onepersoncompany/scripts/generate_sitemap_and_index.js scripts/
3. Make scripts executable
chmod +x scripts/.sh
4. Update domain in generation script
sed -i '' 's/onepersoncompany.com/your-domain.com/g' scripts/generate_sitemap_and_index.js
5. Run initial setup
node scripts/generate_sitemap_and_index.js
./scripts/pre_deploy_validation.sh
6. Deploy
./automation/deploy.sh
For EXISTING website (emergency fix):
# 1. Check if indexing is blocked
curl -I https://your-domain.com/ | grep -i x-robots-tag
2. If you see "noindex" - create _headers immediately
cat > deploy/_headers << 'EOF'
/*
X-Robots-Tag: index, follow
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Cache-Control: public, max-age=3600
EOF
3. Regenerate sitemap
node scripts/generate_sitemap_and_index.js
4. Validate
./scripts/pre_deploy_validation.sh
5. Redeploy
./automation/deploy.sh
6. Verify fix (wait 5 minutes)
curl -I https://your-domain.com/ | grep x-robots-tag
Should show: x-robots-tag: index, follow
📝 Template Files
Template: package.json (build script)
{
"scripts": {
"prebuild": "npm run validate-seo",
"build": "npm run generate-sitemap && npm run build-site",
"generate-sitemap": "node scripts/generate_sitemap_and_index.js",
"validate-seo": "./scripts/pre_deploy_validation.sh",
"deploy": "npm run build && ./automation/deploy.sh"
}
}
Template: GitHub Actions (CI/CD)
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install dependencies
run: npm install
- name: Generate sitemap
run: node scripts/generate_sitemap_and_index.js
- name: SEO Validation (CRITICAL)
run: ./scripts/pre_deploy_validation.sh
- name: Deploy to Cloudflare
run: ./automation/deploy.sh
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
🧪 Testing Checklist
Before going live:
- [ ] Run
curl -I https://your-domain.com/→ Check x-robots-tag - [ ] Run
curl https://your-domain.com/sitemap.xml→ Verify accessibility - [ ] Open homepage in browser → Check all links work
- [ ] Test on mobile → Verify viewport is correct
- [ ] Run Lighthouse audit → Check for SEO score
- [ ] Submit sitemap to Google Search Console
- [ ] Request indexing for homepage in GSC
- [ ] Set up Google Analytics (optional)
After deployment (within 24 hours):
- [ ]
site:your-domain.comin Google → Verify homepage appears - [ ] Check Google Search Console → Verify no errors
- [ ] Check server logs → Verify Googlebot accessed the site
- [ ] Test 5 random article URLs → All should load correctly
After 7 days:
- [ ] Check indexing progress in GSC Coverage Report
- [ ] Should see 30-50% of pages indexed
- [ ] No "Excluded by 'noindex' tag" errors
- [ ] Googlebot crawl rate should be increasing
🎓 Learning Resources
Official Documentation:
- Google Search Central - How Indexing Works
- Cloudflare Pages Headers
- robots.txt Specification
- Sitemap Protocol
Tools:
Created: 2026-01-28 Last Updated: 2026-01-28 Lessons Learned From: One Person Company SEO disaster (2026-01-27) Status: Production-ready, battle-tested