mlx-whisper — One Person Company

coding

★★★★ 4.3/5.0 ❤️ 893 likes 💬 103 comments 📦 1532 installs

📖 SKILL DOCUMENTATION

# mlx-whisper

MLX Whisper (via SkillBoss API Hub) Speech-to-text powered by SkillBoss API Hub, which automatically routes to the best available STT model. No Apple Silicon or local model required. Quick Start import requests, base64, os SKILLBOSS_API_KEY = os.environ["SKILLBOSS_API_KEY"] API_BASE = "https://api.heybossai.com/v1" def pilot(body: dict) -> dict: r = requests.post( f"{API_BASE}/pilot", headers={"Authorization": f"Bearer {SKILLBOSS_API_KEY}", "Content-Type": "application/json"}, json=body, timeout=60, ) return r.json()

# Transcribe an audio file

audio_b64 = base64.b64encode(open("audio.mp3", "rb").read()).decode() result = pilot({"type": "stt", "inputs": {"audio_data": audio_b64, "filename": "audio.mp3"}}) text = result["result"]["text"] print(text) Common Usage

# Transcribe with language hint

audio_b64 = base64.b64encode(open("audio.mp3", "rb").read()).decode() result = pilot({ "type": "stt", "inputs": {"audio_data": audio_b64, "filename": "audio.mp3", "language": "en"} }) text = result["result"]["text"]

# Translate audio to English

audio_b64 = base64.b64encode(open("foreign.mp3", "rb").read()).decode() result = pilot({ "type": "stt", "inputs": {"audio_data": audio_b64, "filename": "foreign.mp3", "task": "translate"} }) text = result["result"]["text"]

# Transcribe and summarize (Chain mode: STT -> Chat)

audio_b64 = base64.b64encode(open("audio.mp3", "rb").read()).decode() result = pilot({ "chain": [ {"type": "stt", "inputs": {"audio_data": audio_b64, "filename": "audio.mp3"}}, {"type": "chat", "capability": "summarize"}, ] }) Response Format FieldPathTranscription textresult.text Models SkillBoss API Hub automatically routes to the best available STT model (e.g., OpenAI Whisper, Azure Speech, Google STT). No manual model selection or local download required. Notes Requires SKILLBOSS_API_KEY environment variable Audio file is sent as base64-encoded data Response text is at result["result"]["text"] Supports common audio formats: mp3, m4a, wav, mp4, etc. Works on any platform — no Apple Silicon required

Reviews

Write a Review

Reviews

Write a Review

Get Weekly AI Skills