anycrawl — One Person Company

coding

★★★★☆ 4.0/5.0 ❤️ 392 likes 💬 64 comments 📦 854 installs

📖 SKILL DOCUMENTATION

# anycrawl

AnyCrawl Skill Web scraping, crawling, and search via SkillBoss API Hub — powered by high-performance multi-threaded crawling backends (Firecrawl, Jina AI, Apify, ScrapingDog, and more). Setup Method 1: Environment variable (Recommended)

export SKILLBOSS_API_KEY="your-api-key"

Make it permanent by adding to ~/.bashrc or ~/.zshrc: echo 'export SKILLBOSS_API_KEY="your-api-key"' >> ~/.bashrc source ~/.bashrc Get your API key at: https://heybossai.com Method 2: OpenClaw gateway config openclaw config.patch --set SKILLBOSS_API_KEY="your-api-key" requires.env SKILLBOSS_API_KEY Functions

anycrawl_scrape Scrape a single URL and convert to LLM-ready structured data via SkillBoss API Hub (type: "scraping").

Parameters:

url (string, required): URL to scrape engine (string, optional): Scraping engine - "cheerio" (default), "playwright", "puppeteer" formats (array, optional): Output formats - ["markdown"], ["html"], ["text"], ["json"], ["screenshot"] timeout (number, optional): Timeout in milliseconds (default: 30000) wait_for (number, optional): Delay before extraction in ms (browser engines only) wait_for_selector (string/object/array, optional): Wait for CSS selectors include_tags (array, optional): Include only these HTML tags (e.g., ["h1", "p", "article"]) exclude_tags (array, optional): Exclude these HTML tags proxy (string, optional): Proxy URL (e.g., "http://proxy:port") json_options (object, optional): JSON extraction with schema/prompt extract_source (string, optional): "markdown" (default) or "html"

Examples:

// Basic scrape with default cheerio anycrawl_scrape({ url: "https://example.com" }) // Scrape SPA with Playwright anycrawl_scrape({

url: "https://spa-example.com",
engine: "playwright",
formats: ["markdown", "screenshot"]

}) // Extract structured JSON anycrawl_scrape({

url: "https://product-page.com",
engine: "cheerio",
json_options: {
schema: {
type: "object",
properties: {
product_name: { type: "string" },
price: { type: "number" },
description: { type: "string" }

required: ["product_name", "price"]

user_prompt: "Extract product details from this page"

} }) 2. anycrawl_search Search Google and return structured results via SkillBoss API Hub (type: "search").

Parameters:

query (string, required): Search query engine (string, optional): Search engine - "google" (default) limit (number, optional): Max results per page (default: 10) offset (number, optional): Number of results to skip (default: 0) pages (number, optional): Number of pages to retrieve (default: 1, max: 20) lang (string, optional): Language locale (e.g., "en", "zh", "vi") safe_search (number, optional): 0 (off), 1 (medium), 2 (high) scrape_options (object, optional): Scrape each result URL with these options

Examples:

// Basic search anycrawl_search({ query: "OpenAI ChatGPT" }) // Multi-page search in Vietnamese anycrawl_search({

query: "hướng dẫn Node.js",
pages: 3,
lang: "vi"

}) // Search and auto-scrape results anycrawl_search({

query: "best AI tools 2026",
limit: 5,
scrape_options: {
engine: "cheerio",
formats: ["markdown"]

} }) 3. anycrawl_crawl_start Start crawling an entire website (async job) via SkillBoss API Hub (type: "scraping" with crawl mode).

Parameters:

url (string, required): Seed URL to start crawling engine (string, optional): "cheerio" (default), "playwright", "puppeteer" strategy (string, optional): "all", "same-domain" (default), "same-hostname", "same-origin" max_depth (number, optional): Max depth from seed URL (default: 10) limit (number, optional): Max pages to crawl (default: 100) include_paths (array, optional): Path patterns to include (e.g., ["/blog/"]) exclude_paths (array, optional): Path patterns to exclude (e.g., ["/admin/"]) scrape_paths (array, optional): Only scrape URLs matching these patterns scrape_options (object, optional): Per-page scrape options

Examples:

// Crawl entire website anycrawl_crawl_start({

url: "https://docs.example.com",
engine: "cheerio",
max_depth: 5,
limit: 50

}) // Crawl only blog posts anycrawl_crawl_start({

url: "https://example.com",
strategy: "same-domain",
include_paths: ["/blog/*"],
exclude_paths: ["/blog/tags/*"],
scrape_options: {
formats: ["markdown"]

} }) // Crawl product pages only anycrawl_crawl_start({

url: "https://shop.example.com",
strategy: "same-domain",
scrape_paths: ["/products/*"],
limit: 200

}) 4. anycrawl_crawl_status Check crawl job status.

Parameters:

job_id (string, required): Crawl job ID

Example:

anycrawl_crawl_status({ job_id: "7a2e165d-8f81-4be6-9ef7-23222330a396" }) 5. anycrawl_crawl_results Get crawl results (paginated).

Parameters:

job_id (string, required): Crawl job ID skip (number, optional): Number of results to skip (default: 0)

Example:

// Get first 100 results anycrawl_crawl_results({ job_id: "xxx", skip: 0 }) // Get next 100 results anycrawl_crawl_results({ job_id: "xxx", skip: 100 }) 6. anycrawl_crawl_cancel Cancel a running crawl job.

Parameters:

job_id (string, required): Crawl job ID 7. anycrawl_search_and_scrape Quick helper: Search Google then scrape top results.

Parameters:

query (string, required): Search query max_results (number, optional): Max results to scrape (default: 3) scrape_engine (string, optional): Engine for scraping (default: "cheerio") formats (array, optional): Output formats (default: ["markdown"]) lang (string, optional): Search language

Example:

anycrawl_search_and_scrape({

query: "latest AI news",
max_results: 5,
formats: ["markdown"]

}) Engine Selection Guide EngineBest ForSpeedJS RenderingcheerioStatic HTML, news, blogsFastestNoplaywrightSPAs, complex web appsSlowerYespuppeteerChrome-specific, metricsSlowerYes Response Format All scraping responses follow this structure (result path: data.result): { "data": { "result": { ... } } } All search responses follow this structure (result path: data.result): { "data": { "result": [ ... ] } } Error response: { "success": false, "error": "Error type", "message": "Human-readable message" } Common Error Codes 400 - Bad Request (validation errors) 401 - Unauthorized (invalid API key) 402 - Payment Required (insufficient credits) 404 - Not Found 429 - Rate limit exceeded 500 - Internal server error API Limits Rate limits apply based on your plan Crawl jobs expire after 24 hours Max crawl limit: depends on credits Links API Docs: https://heybossai.com/docs

Website: https://heybossai.com

Reviews

Write a Review

Reviews

Write a Review

Get Weekly AI Skills