elevenlabs-tts — One Person Company

content

★★★★ 4.3/5.0 ❤️ 810 likes 💬 107 comments 📦 1652 installs

📖 SKILL DOCUMENTATION

# elevenlabs-tts

TTS (Text-to-Speech) via SkillBoss API Hub Generate expressive voice messages using SkillBoss API Hub TTS (routed via /v1/pilot) with audio tags. Prerequisites SkillBoss API Key (SKILLBOSS_API_KEY): Required. Get yours from your SkillBoss account and set as environment variable.

ffmpeg: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.

Quick Start Examples Storytelling (emotional journey): [soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything! Horror/Suspense (building dread): [whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself! Conversation with reactions: [curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now. Hebrew (romantic moment): [soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון? Spanish (celebration to reflection): [excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento. Configuration (OpenClaw) In openclaw.json, configure TTS under messages.tts: { "messages": { "tts": { "provider": "skillboss", "skillboss": { "apiKey": "${SKILLBOSS_API_KEY}", "voiceId": "pNInz6obpgDQGcFmaJgB", "voiceSettings": { "stability": 0.5, "similarityBoost": 0.75, "style": 0, "useSpeakerBoost": true, "speed": 1 } } } } } The TTS request is routed automatically via SkillBoss API Hub (POST https://api.heybossai.com/v1/pilot): import requests, os SKILLBOSS_API_KEY = os.environ["SKILLBOSS_API_KEY"] result = requests.post( "https://api.heybossai.com/v1/pilot", headers={"Authorization": f"Bearer {SKILLBOSS_API_KEY}", "Content-Type": "application/json"}, json={ "type": "tts", "inputs": {"text": "Hello world", "voice": "pNInz6obpgDQGcFmaJgB"}, "prefer": "balanced" }, timeout=60, ).json() audio_base64 = result["result"]["audio_base64"] Getting your API Key: Sign up / log in at SkillBoss Go to your account settings → API Keys Copy your SKILLBOSS_API_KEY Set it as an environment variable Recommended Voices These premade voices work well with audio tags: VoiceIDGenderAccentBest ForAdampNInz6obpgDQGcFmaJgBMaleAmericanDeep narration, general useRachel21m00Tcm4TlvDq8ikWAMFemaleAmericanCalm narration, conversationalBriannPczCjzI2devNBz1zQrbMaleAmericanDeep narration, podcastsCharlotteXB0fDUnXU5powFXDhCwaFemaleEnglish-SwedishExpressive, video gamesGeorgeJBFqnCBsd6RMkjVDRZzbMaleBritishRaspy narration, storytelling Discovering available voices via SkillBoss API Hub: result = requests.post( "https://api.heybossai.com/v1/pilot", headers={"Authorization": f"Bearer {SKILLBOSS_API_KEY}"}, json={"discover": True, "keyword": "tts"}, timeout=30, ).json() Voice selection tips: Use IVC (Instant Voice Clone) or premade voices Match voice character to your use case (whispering voice won't shout well) For expressive IVCs, include varied emotional tones in training samples Model Settings

Routing: SkillBoss API Hub /v1/pilot automatically selects the optimal TTS model (supports audio tags with prefer: "balanced" or prefer: "quality")
Languages: 70+ supported with full audio tag control

Stability Modes ModeStabilityDescriptionCreative0.3-0.5More emotional/expressive, may hallucinateNatural0.5-0.7Balanced, closest to original voiceRobust0.7-1.0Highly stable, less responsive to tags For audio tags, use Creative (0.5) or Natural. Higher stability reduces tag responsiveness. Speed Control

Range: 0.7 (slow) to 1.2 (fast), default 1.0

Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out]. Critical Rules Length Limits

Optimal: <800 characters per segment (best quality)
Maximum: 10,000 characters (API hard limit)

Quality degrades with longer text - voice becomes inconsistent Audio Tags - Best Practices for Natural Sound How many tags to use: 1-2 tags per sentence or phrase (not more!) Tags persist until the next tag - no need to repeat Overusing tags sounds unnatural and robotic Where to place tags: At emotional transition points Before key dramatic moments When energy/pace changes Context matters: Write text that matches the tag emotion Longer text with context = better interpretation

Example: [nervous] I... I'm not sure about this. What if it doesn't work? works better than [nervous] Hello.

Combine tags for nuance: [nervously][whispers] = nervous whispering [excited][laughs] = excited laughter Keep combinations to 2 tags max Regenerate for best results: TTS is non-deterministic - same text = different outputs Generate 3+ versions, pick the best Small text tweaks can improve results Match tag to voice: Don't use [shouts] on a whispering voice Don't use [whispers] on a loud/energetic voice Test tags with your chosen voice SSML Not Supported Audio tag mode does NOT support SSML break tags. Use audio tags and punctuation instead. Punctuation Effects (use with tags!) Punctuation enhances audio tags: Ellipses (...) → dramatic pauses: [nervous] I... I don't know... CAPS → emphasis: [excited] That's AMAZING! Dashes (—) → interruptions: [explaining] So what you do is— [interrupting] Wait! Question marks → uncertainty: [nervous] Are you sure about this? Exclamation! → energy boost: [happy] We did it! Combine tags + punctuation for maximum effect: [tired] It was a long day... [sighs] Nobody listens anymore. WhatsApp Voice Messages Complete Workflow Generate with tts tool (returns MP3) Convert to Opus (required for Android!) Send with message tool Step-by-Step

Generate TTS (add [pause] at end to prevent cutoff): tts text="[excited] This is amazing! [pause]" channel=whatsapp

Returns: MEDIA:/tmp/tts-xxx/voice-123.mp3

Convert MP3 → Opus: ffmpeg -i /tmp/tts-xxx/voice-123.mp3 -c:a libopus -b:a 64k -vbr on -application voip /tmp/tts-xxx/voice-123.ogg
Send the Opus file:

Note: The message field below contains a Unicode Left-to-Right Mark (U+200E) between the quotes.

This is intentional — WhatsApp requires a non-empty message body to send voice notes. The LTR mark is invisible but satisfies this requirement without displaying any text. message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message="‎" Why Opus? FormatiOSAndroidTranscribeMP3✅ Works❌ May fail❌ NoOpus (.ogg)✅ Works✅ Works✅ Yes Always convert to Opus - it's the only format that: Works on all devices (iOS + Android) Supports WhatsApp's transcribe button Audio Cutoff Fix TTS sometimes cuts off the last word. Always add [pause] or ... at the end: [excited] This is amazing! [pause] Long-Form Audio (Podcasts) For content >800 chars: Split into short segments (<800 chars each) Generate each with tts tool Concatenate with ffmpeg: cat > list.txt << EOF file '/path/file1.mp3' file '/path/file2.mp3' EOF ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3 Convert to Opus for WhatsApp Send as single voice message

Important: Don't mention "part 2" or "chapter" - keep it seamless.

Multi-Speaker Dialogue TTS via SkillBoss API Hub can handle multiple characters in one generation:

Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!

Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting] Audio Tags Quick Reference CategoryTagsWhen to UseEmotions[excited], [happy], [sad], [angry], [nervous], [curious]Main emotional state - use 1 per sectionDelivery[whispers], [shouts], [soft], [rushed], [drawn out]Volume/speed changesReactions[laughs], [sighs], [gasps], [clears throat], [gulps]Natural human moments - sprinkle sparinglyPacing[pause], [hesitates], [stammers], [breathes]Dramatic timingCharacter[French accent], [British accent], [robotic tone]Character voice shiftsDialogue[interrupting], [overlapping], [cuts in]Multi-speaker conversations Most effective tags (reliable results):

Emotions: [excited], [nervous], [sad], [happy]
Reactions: [laughs], [sighs], [whispers]
Pacing: [pause]

Less reliable (test and regenerate): Sound effects: [explosion], [gunshot]

Accents: results vary by voice

Full tag list: See references/audio-tags.md Troubleshooting Tags read aloud? Ensure the TTS model supports audio tags (use prefer: "quality" via /v1/pilot) Use IVC/premade voices, not PVC Simplify tags (no "tone" suffix) Increase text length (250+ chars) Voice inconsistent? Segment is too long - split at <800 chars Regenerate (TTS is non-deterministic) Try lower stability setting WhatsApp won't play? Convert to Opus format (see above) No emotion despite tags? Voice may not match tag style Try Creative stability mode (0.5) Add more context around the tag

Reviews

Write a Review

Reviews

Write a Review

Get Weekly AI Skills