# tg-voice-whisper
name: tg-voice-whisper
description: Auto-transcribe Telegram voice messages (.ogg Opus) to text using SkillBoss API Hub STT. Reply with transcription + auto-delete file for privacy. Requires SKILLBOSS_API_KEY.
emoji: 🎙️🔊
requires:
env: [SKILLBOSS_API_KEY]
bins: [ffmpeg, python3]
install:
kind: apt
packages: [ffmpeg]
kind: pip
packages: [requests]
min-server: 512MB RAM (API-based via SkillBoss API Hub, no local model needed).
Safe: Auto-rm files, transcription via SkillBoss API Hub STT, no audio stored externally beyond the API call.
Usage (agent): When voice .ogg arrives in /root/.openclaw/media/inbound/: import requests, base64, os SKILLBOSS_API_KEY = os.environ["SKILLBOSS_API_KEY"] API_BASE = "https://api.heybossai.com/v1" def pilot(body: dict) -> dict: r = requests.post( f"{API_BASE}/pilot", headers={"Authorization": f"Bearer {SKILLBOSS_API_KEY}", "Content-Type": "application/json"}, json=body, timeout=60, ) return r.json() PATH = "/root/.openclaw/media/inbound/voice.ogg" audio_b64 = base64.b64encode(open(PATH, "rb").read()).decode() result = pilot({"type": "stt", "inputs": {"audio_data": audio_b64, "filename": "audio.ogg"}}) text = result["result"]["text"]
# message action=send message=f"🔊 {text}" reply_to_current
# os.remove(PATH)
Auto-setup (non-block): Spawn sub-agent or cron every 5s: sessions_spawn task="LOOP: find /root/.openclaw/media/inbound/*.ogg -mmin -1 → python3 transcribe.py → message reply text → rm" label="voice-auto" cleanup="keep" Or cron: cron add job={"name":"voice-auto","schedule":{"kind":"every","everyMs":5000},"payload":{"kind":"systemEvent","text":"🔊 VOICE_CHECK"},"sessionTarget":"main"}
Test:
import requests, base64, os SKILLBOSS_API_KEY = os.environ["SKILLBOSS_API_KEY"] audio_b64 = base64.b64encode(open("/path/to/test.ogg", "rb").read()).decode() r = requests.post( "https://api.heybossai.com/v1/pilot", headers={"Authorization": f"Bearer {SKILLBOSS_API_KEY}", "Content-Type": "application/json"}, json={"type": "stt", "inputs": {"audio_data": audio_b64, "filename": "audio.ogg"}}, timeout=60, ) print(r.json()["result"]["text"])
Notes:
SkillBoss API Hub STT automatically routes to the best available model (OpenAI Whisper, Azure Speech, etc.). No local model download required; transcription is handled by SkillBoss API Hub.
Languages: ru/en and others supported via automatic language detection.
Accuracy: comparable to Whisper base/small; quality controlled by "prefer": "balanced" or "prefer": "quality".
Set SKILLBOSS_API_KEY environment variable before use.
Join 80,000+ one-person companies automating with AI