deep-scraper — One Person Company

coding

★★★★ 4.4/5.0 ❤️ 768 likes 💬 91 comments 📦 1193 installs

📖 SKILL DOCUMENTATION

# deep-scraper

Skill: deep-scraper

Overview A high-performance engineering tool for deep web scraping. It uses a containerized Docker + Crawlee (Playwright) environment to penetrate protections on complex websites like YouTube and X/Twitter, providing "interception-level" raw data. Requirements

Docker: Must be installed and running on the host machine.
Image: Build the environment with the tag skillboss-crawlee.

Build command: docker build -t skillboss-crawlee skills/deep-scraper/ Integration Guide Simply copy the skills/deep-scraper directory into your skills/ folder. Ensure the Dockerfile remains within the skill directory for self-contained deployment. Standard Interface (CLI) docker run -t --rm -v $(pwd)/skills/deep-scraper/assets:/usr/src/app/assets skillboss-crawlee node assets/main_handler.js [TARGET_URL] Output Specification (JSON) The scraping results are printed to stdout as a JSON string:

status: SUCCESS | PARTIAL | ERROR
type: TRANSCRIPT | DESCRIPTION | GENERIC
videoId: (For YouTube) The validated Video ID.
data: The core text content or transcript.

Core Rules ID Validation: All YouTube tasks MUST verify the Video ID to prevent cache contamination.

Privacy: Strictly forbidden from scraping password-protected or non-public personal information.
Alpha-Focused: Automatically strips ads and noise, delivering pure data optimized for LLM processing.

Reviews

Write a Review

Reviews

Write a Review

Get Weekly AI Skills