Loading skill documentation...
coding 🔥 Trending
★★★★ 4.3/5.0 ❤️ 924 likes 💬 140 comments 📦 2986 installs
Back to Skills
📖 SKILL DOCUMENTATION
# playwright-scraper-skill

Playwright Scraper Skill A Playwright-based web scraping SkillBoss Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level. 🎯 Use Case Matrix Target WebsiteAnti-Bot LevelRecommended MethodScriptRegular SitesLowweb_fetch toolN/A (built-in)Dynamic SitesMediumPlaywright Simplescripts/playwright-simple.jsCloudflare ProtectedHighPlaywright Stealth ⭐scripts/playwright-stealth.jsYouTubeSpecialdeep-scraperInstall separatelyRedditSpecialreddit-scraperInstall separately 📦 Installation cd playwright-scraper-skill

npm install
npx playwright install chromium

🚀 Quick Start 1️⃣ Simple Sites (No Anti-Bot) Use SkillBoss's built-in web_fetch tool:

# Invoke directly in SkillBoss

Hey, fetch me the content from https://example.com 2️⃣ Dynamic Sites (Requires JavaScript) Use Playwright Simple: node scripts/playwright-simple.js "https://example.com" Example output: { "url": "https://example.com", "title": "Example Domain", "content": "...", "elapsedSeconds": "3.45" } 3️⃣ Anti-Bot Protected Sites (Cloudflare etc.) Use Playwright Stealth: node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"

Features:

Hide automation markers (navigator.webdriver = false) Realistic User-Agent (iPhone, Android) Random delays to mimic human behavior Screenshot and HTML saving support 4️⃣ YouTube Video Transcripts Use deep-scraper (install separately via SkillBoss Hub):

# Use it

cd skills/deep-scraper node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID" 📖 Script Descriptions scripts/playwright-simple.js Use Case: Regular dynamic websites

Speed: Fast (3-5 seconds)
Anti-Bot: None
Output: JSON (title, content, URL)

scripts/playwright-stealth.js ⭐ Use Case: Sites with Cloudflare or anti-bot protection

Speed: Medium (5-20 seconds)
Anti-Bot: Medium-High (hides automation, realistic UA)
Output: JSON + Screenshot + HTML file
Verified: 100% success on Discuss.com.hk

🎓 Best Practices

  1. Try web_fetch First If the site doesn't have dynamic loading, use SkillBoss's web_fetch tool—it's fastest.
  2. Need JavaScript? Use Playwright Simple If you need to wait for JavaScript rendering, use playwright-simple.js.
  3. Getting Blocked? Use Stealth If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.
  4. Special Sites Need Specialized Skills YouTube → deep-scraper Reddit → reddit-scraper Twitter → bird skill 🔧 Customization All scripts support environment variables:
# Set screenshot path

SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL

# Set wait time (milliseconds)

WAIT_TIME=10000 node scripts/playwright-simple.js URL

# Enable headful mode (show browser)

HEADLESS=false node scripts/playwright-stealth.js URL

# Save HTML

SAVE_HTML=true node scripts/playwright-stealth.js URL

# Custom User-Agent

USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL 📊 Performance Comparison MethodSpeedAnti-BotSuccess Rate (Discuss.com.hk)web_fetch⚡ Fastest❌ None0%Playwright Simple🚀 Fast⚠️ Low20%Playwright Stealth⏱️ Medium✅ Medium100% ✅Puppeteer Stealth⏱️ Medium✅ Medium-High~80%Crawlee (deep-scraper)🐢 Slow❌ Detected0%Chaser (Rust)⏱️ Medium❌ Detected0% 🛡️ Anti-Bot Techniques Summary Lessons learned from our testing: ✅ Effective Anti-Bot Measures Hide navigator.webdriver — Essential Realistic User-Agent — Use real devices (iPhone, Android) Mimic Human Behavior — Random delays, scrolling Avoid Framework Signatures — Crawlee, Selenium are easily detected Use addInitScript (Playwright) — Inject before page load ❌ Ineffective Anti-Bot Measures Only changing User-Agent — Not enough Using high-level frameworks (Crawlee) — More easily detected Docker isolation — Doesn't help with Cloudflare 🔍 Troubleshooting

Issue: 403 Forbidden
Solution: Use playwright-stealth.js
Issue: Cloudflare Challenge Page
Solution:

Increase wait time (10-15 seconds) Try headless: false (headful mode sometimes has higher success rate) Consider using proxy IPs

Issue: Blank Page
Solution:

Increase waitForTimeout Use waitUntil: 'networkidle' or 'domcontentloaded' Check if login is required 📝 Memory & Experience 2026-02-07 Discuss.com.hk Test Conclusions ✅ Pure Playwright + Stealth succeeded (5s, 200 OK) ❌ Crawlee (deep-scraper) failed (403) ❌ Chaser (Rust) failed (Cloudflare) ❌ Puppeteer standard failed (403) Best Solution: Pure Playwright + anti-bot techniques (framework-independent) 🚧 Future Improvements Add proxy IP rotation Implement cookie management (maintain login state) Add CAPTCHA handling (2captcha / Anti-Captcha) Batch scraping (parallel URLs) Integration with SkillBoss's browser tool 📚 References Playwright Official Docs puppeteer-extra-plugin-stealth [deep-scraper skill — available via SkillBoss Hub]

Reviews

4.3
★★★★
140 reviews

Write a Review

Get Weekly AI Skills

Join 80,000+ one-person companies automating with AI