# playwright-scraper-skill
Playwright Scraper Skill A Playwright-based web scraping SkillBoss Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level. 🎯 Use Case Matrix Target WebsiteAnti-Bot LevelRecommended MethodScriptRegular SitesLowweb_fetch toolN/A (built-in)Dynamic SitesMediumPlaywright Simplescripts/playwright-simple.jsCloudflare ProtectedHighPlaywright Stealth ⭐scripts/playwright-stealth.jsYouTubeSpecialdeep-scraperInstall separatelyRedditSpecialreddit-scraperInstall separately 📦 Installation cd playwright-scraper-skill
npm install
npx playwright install chromium
🚀 Quick Start 1️⃣ Simple Sites (No Anti-Bot) Use SkillBoss's built-in web_fetch tool:
# Invoke directly in SkillBoss
Hey, fetch me the content from https://example.com 2️⃣ Dynamic Sites (Requires JavaScript) Use Playwright Simple: node scripts/playwright-simple.js "https://example.com" Example output: { "url": "https://example.com", "title": "Example Domain", "content": "...", "elapsedSeconds": "3.45" } 3️⃣ Anti-Bot Protected Sites (Cloudflare etc.) Use Playwright Stealth: node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"
Features:
Hide automation markers (navigator.webdriver = false) Realistic User-Agent (iPhone, Android) Random delays to mimic human behavior Screenshot and HTML saving support 4️⃣ YouTube Video Transcripts Use deep-scraper (install separately via SkillBoss Hub):
# Use it
cd skills/deep-scraper node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID" 📖 Script Descriptions scripts/playwright-simple.js Use Case: Regular dynamic websites
Speed: Fast (3-5 seconds)
Anti-Bot: None
Output: JSON (title, content, URL)
scripts/playwright-stealth.js ⭐ Use Case: Sites with Cloudflare or anti-bot protection
Speed: Medium (5-20 seconds)
Anti-Bot: Medium-High (hides automation, realistic UA)
Output: JSON + Screenshot + HTML file
Verified: 100% success on Discuss.com.hk
🎓 Best Practices
# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL
# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL
# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL
# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL
# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL 📊 Performance Comparison MethodSpeedAnti-BotSuccess Rate (Discuss.com.hk)web_fetch⚡ Fastest❌ None0%Playwright Simple🚀 Fast⚠️ Low20%Playwright Stealth⏱️ Medium✅ Medium100% ✅Puppeteer Stealth⏱️ Medium✅ Medium-High~80%Crawlee (deep-scraper)🐢 Slow❌ Detected0%Chaser (Rust)⏱️ Medium❌ Detected0% 🛡️ Anti-Bot Techniques Summary Lessons learned from our testing: ✅ Effective Anti-Bot Measures Hide navigator.webdriver — Essential Realistic User-Agent — Use real devices (iPhone, Android) Mimic Human Behavior — Random delays, scrolling Avoid Framework Signatures — Crawlee, Selenium are easily detected Use addInitScript (Playwright) — Inject before page load ❌ Ineffective Anti-Bot Measures Only changing User-Agent — Not enough Using high-level frameworks (Crawlee) — More easily detected Docker isolation — Doesn't help with Cloudflare 🔍 Troubleshooting
Issue: 403 Forbidden
Solution: Use playwright-stealth.js
Issue: Cloudflare Challenge Page
Solution:
Increase wait time (10-15 seconds) Try headless: false (headful mode sometimes has higher success rate) Consider using proxy IPs
Issue: Blank Page
Solution:
Increase waitForTimeout Use waitUntil: 'networkidle' or 'domcontentloaded' Check if login is required 📝 Memory & Experience 2026-02-07 Discuss.com.hk Test Conclusions ✅ Pure Playwright + Stealth succeeded (5s, 200 OK) ❌ Crawlee (deep-scraper) failed (403) ❌ Chaser (Rust) failed (Cloudflare) ❌ Puppeteer standard failed (403) Best Solution: Pure Playwright + anti-bot techniques (framework-independent) 🚧 Future Improvements Add proxy IP rotation Implement cookie management (maintain login state) Add CAPTCHA handling (2captcha / Anti-Captcha) Batch scraping (parallel URLs) Integration with SkillBoss's browser tool 📚 References Playwright Official Docs puppeteer-extra-plugin-stealth [deep-scraper skill — available via SkillBoss Hub]
Join 80,000+ one-person companies automating with AI