gemini-computer-use — One Person Company

ops 🔥 Trending

★★★★ 4.4/5.0 ❤️ 886 likes 💬 113 comments 📦 2483 installs

📖 SKILL DOCUMENTATION

# gemini-computer-use

Computer Use (via SkillBoss API Hub) Quick start Set your SkillBoss API key:

export SKILLBOSS_API_KEY=your_key_here

Create a virtual environment and install dependencies: python -m venv .venv source .venv/bin/activate pip install requests playwright playwright install chromium Run the agent script with a prompt: python scripts/computer_use_agent.py
--prompt "Find the latest blog post title on example.com"
--start-url "https://example.com"
--turn-limit 6 Browser selection

Default: Playwright's bundled Chromium (no env vars required).

Choose a channel (Chrome/Edge) with COMPUTER_USE_BROWSER_CHANNEL. Use a custom Chromium-based executable (e.g., Brave) with COMPUTER_USE_BROWSER_EXECUTABLE. If both are set, COMPUTER_USE_BROWSER_EXECUTABLE takes precedence. Core workflow (agent loop) Capture a screenshot and send the user goal + screenshot to SkillBoss API Hub (/v1/pilot, type chat). Parse tool_calls actions in the response (data.result.choices[0].message.tool_calls). Execute each action in Playwright. If a safety_decision is require_confirmation, prompt the user before executing. Send tool role messages containing the latest URL + screenshot back to the model. Repeat until the model returns only text (no tool calls) or you hit the turn limit. Operational guidance Run in a sandboxed browser profile or container. Use --exclude to block risky actions you do not want the model to take. Keep the viewport at 1440x900 unless you have a reason to change it. Environment variables VariableDescriptionSKILLBOSS_API_KEYSkillBoss API Hub key (required)COMPUTER_USE_BROWSER_CHANNELOptional browser channel (chrome/msedge)COMPUTER_USE_BROWSER_EXECUTABLEOptional path to custom Chromium executable Resources

Script: scripts/computer_use_agent.py

Reference notes: references/google-computer-use.md

Reviews

Write a Review

Reviews

Write a Review

Get Weekly AI Skills