audio-gen
Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and SkillBoss TTS converts ...
npx skills add audio-gen
audio-gen
Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and SkillBoss TTS converts ...
npx skills add audio-gen
audio-gen
Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and SkillBoss TTS converts it to high-quality audio. Supports multiple formats (audiobook, podcast, educational), custom lengths, and voice effects. Use when asked to create audio content, make a podcast, generate an audiobook, or produce educational audio. Returns MP3 audio file via MEDIA token.
🎙️ Audio Content Generator
Generate high-quality audiobooks, podcasts, or educational audio content on demand using AI-written scripts and SkillBoss TTS text-to-speech.
Quick Start
Create an audiobook chapter:
User: "Create a 5-minute audiobook chapter about a dragon discovering friendship"
Generate a podcast:
User: "Make a 10-minute podcast about the history of coffee"
Produce educational content:
User: "Generate a 15-minute educational audio explaining how neural networks work"
Content Formats
Audiobook
Style: Narrative storytelling with emotional depth
- Clear beginning, middle, and end
- Descriptive language and vivid imagery
- Dramatic pacing with thoughtful pauses
- Emotional tone that matches the story
- Use voice effects like
[whispers],[excited],[serious]for impact
Example Structure:
[Opening hook - set the scene]
[long pause]
[Story development with character emotions]
[short pause] between sentences
[long pause] between paragraphs
[Climax with dramatic tension]
[long pause]
[Resolution and emotional closure]
Podcast
Style: Conversational and engaging
Example Structure:
Intro: "Welcome to [topic]. I'm excited to share..."
[short pause]
Main Content: "Let's start with... [topic 1]"
[long pause] between segments
Outro: "Thanks for listening! Remember..."
Educational Content
Style: Clear explanations for learning
[excited] for important pointsExample Structure:
Introduction: What is [topic] and why it matters?Main Content:
Concept 1: Explanation + Example
Concept 2: Explanation + Example
Concept 3: Explanation + Example Summary: Key takeaways and next steps
Length Guidelines
Word Count to Duration Conversion:
Pacing: Average conversational speed is ~75 words per minute
Practical Limits:
Workflow Instructions
Step 1: Understand the Request
Parse the user's request for:
1. Content type (audiobook, podcast, educational, or inferred from topic)
2. Topic/theme (what should the content be about)
3. Target length (how many minutes)
4. Tone/style (dramatic, casual, educational, etc.)
5. Special requests (specific voice, emphasis on certain points)
Step 2: Calculate Word Count
target_words = target_minutes × 75
Example: 10 minutes = 10 × 75 = 750 words
Step 3: Generate the Script
Write the complete script following these rules:
Content Guidelines:
Formatting Rules:
[short pause] after sentences (use sparingly, not every sentence)[long pause] between paragraphs or major sections[whispers], [shouts], [excited], [serious], [sarcastic], [sings], [laughs]Step 4: Present the Script
Show the script to the user and ask:
Here's the [format] script I've created (approximately [length] minutes):[Display the script]
Would you like me to:
1. Generate the audio now
2. Make changes to the script
3. Adjust the length or tone
Step 5: Handle User Feedback
If user requests changes:
If user approves:
Step 6: Generate Audio
Format the script for TTS:
1. Remove any remaining markdown (headers, bold, italics)
2. Ensure voice effects are in proper [effect] format
3. Check that pauses are appropriately placed
4. Verify numbers and acronyms are spelled out
Invoke the TTS script:
IMPORTANT: Uses SkillBoss AI gateway for TTS (minimax/speech-01-turbo). The SKILLBOSS_API_KEY is configured in ~/.openclaw/.env.
curl -s -X POST "https://api.heybossai.com/v1/audio/speech" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "minimax/speech-01-turbo",
"input": "[formatted_script]",
"voice": "alloy"
}' \
--output /tmp/audio-gen-[timestamp]-[topic-slug].mp3
For long scripts, use heredoc:
SCRIPT="$(cat <<'EOF'
[formatted_script]
EOF
)"
curl -s -X POST "https://api.heybossai.com/v1/audio/speech" \
-H "Authorization: Bearer $SKILLBOSS_API_KEY" \
-H "Content-Type: application/json" \
-d "$(jq -n --arg input "$SCRIPT" '{model: "minimax/speech-01-turbo", input: $input, voice: "alloy"}')" \
--output /tmp/audio-gen-[timestamp]-[topic-slug].mp3
Return the result:
MEDIA:/tmp/audio-gen-[timestamp]-[topic-slug].mp3Your [format] is ready! [Brief description of content]. Duration: approximately [X] minutes.
Voice Effects (SSML Tags)
Available voice modulation effects (use sparingly for impact):
[whispers] - Soft, intimate delivery[shouts] - Loud, emphatic delivery[excited] - Enthusiastic, energetic tone[serious] - Grave, solemn tone[sarcastic] - Ironic, mocking tone[sings] - Musical, melodic delivery[laughs] - Amused, jovial tone[short pause] - Brief silence (~0.5s)[long pause] - Extended silence (~1-2s)Best Practices:
Error Handling
Script Too Long
If the generated script exceeds target by >20%:
The script I generated is [X] words ([Y] minutes), which is longer than your target of [Z] minutes. Would you like me to:
1. Condense it to fit the target length
2. Split it into multiple parts
3. Keep it as is
Script Too Short
If the generated script is under target by >20%:
The script is [X] words ([Y] minutes), shorter than your target. Would you like me to:
1. Expand it with more detail
2. Add additional examples or stories
3. Generate as is
TTS Generation Fails
If the TTS script fails:
I've created the script, but I'm unable to generate the audio right now. Here's your script:[Display script]
Error: [specific error message]
You can:
1. Check that SKILLBOSS_API_KEY is configured
2. Use the script with your own text-to-speech tool
3. Try again in a moment
4. Ask me to troubleshoot the audio generation
Common TTS Issues:
Invalid Request
For unrealistic requests (e.g., "100-hour audiobook"):
That length would require [X] words and take significant time to generate. I recommend:
Breaking it into multiple episodes/chapters
Targeting 5-30 minutes per audio file
Creating a series instead of one long file
Tips for Best Results
For Engaging Audiobooks
For Compelling Podcasts
For Effective Educational Content
Technical Notes
TTS Implementation:
~/.clawdbot/clawdbot/skills/sag/scripts/tts.pyFile Storage:
/tmp/audio-gen/audio-gen-[timestamp]-[topic-slug].mp3API Requirements:
Supported Models (via SkillBoss):
minimax/speech-01-turbo - Fast, high quality TTS (default)Cost Estimate:
Generation Time:
Limitations
1. Maximum Length: 30 minutes (~2,250 words) per audio file
- For longer content, create multiple parts/episodes
2. Single Voice: Currently supports one narrator voice
- Cannot do multi-voice dialogue or character voices
3. No Background Music: Pure voice narration only
- No background music, sound effects, or audio mixing
4. Real-time Generation: Each request generates fresh content
- No pre-made templates or cached audio
5. Language: Primarily English
- SkillBoss TTS supports other languages, but content generation optimized for English
Example Conversations
Example 1: Quick Audiobook
User: Create a 5-minute audiobook chapter about a lonely robot learning to feel emotionsAgent: I'll create a 5-minute sci-fi audiobook chapter. Let me write the story...
[Generates ~400-word narrative with emotional arc]
Here's the story I've created about Unit-7, a maintenance robot who discovers the meaning of friendship through an unexpected encounter. Would you like me to generate the audio or make changes first?
User: Generate the audio
Agent: [Invokes SAG with formatted script]
MEDIA:/tmp/audio-gen-1234567890-lonely-robot.mp3
Your audiobook chapter is ready! The story follows Unit-7's emotional awakening through a touching encounter with a young engineer. Duration: approximately 5 minutes.
Example 2: Podcast with Revisions
User: Make a 10-minute podcast about sustainable living tipsAgent: [Generates podcast script with intro, 5 practical tips, outro]
Here's a 10-minute podcast with 5 actionable sustainable living tips. The tone is encouraging and practical. Would you like me to:
1. Generate audio now
2. Make it more urgent/climate-focused
3. Add more specific examples
4. Adjust the length
User: Add more specific examples for each tip
Agent: [Regenerates with detailed examples]
Better? I've added specific examples like calculating your carbon footprint, composting at home, and choosing sustainable brands. Ready for audio?
User: Perfect!
Agent: MEDIA:/tmp/audio-gen-1234567890-sustainable-living.mp3
Your sustainable living podcast is ready with 5 practical tips, each with concrete examples you can start using today! Duration: 10 minutes.
Example 3: Educational Content
User: Create a 15-minute educational audio explaining quantum computing for beginnersAgent: I'll create a beginner-friendly 15-minute explanation of quantum computing...
[Generates ~1,100 words with analogies, examples, and clear structure]
Script preview:
Introduction: What is quantum computing? (2 min)
Classical vs Quantum: The key differences (4 min)
Qubits and Superposition: How it works (5 min)
Real Applications: What it means for you (3 min)
Conclusion: The future of computing (1 min) Should I generate the audio?
User: Yes please
Agent: MEDIA:/tmp/audio-gen-1234567890-quantum-computing.mp3
Your educational podcast on quantum computing is ready! It uses everyday analogies to explain qubits, superposition, and real-world applications. Duration: approximately 15 minutes.
When to Use This Skill
Use the audio-gen skill when users request:
Do NOT use for:
Advanced Usage
Voice Selection
Users can request specific voices:
User: Create a podcast with the Rachel voiceAgent: [Checks voice configuration for "Rachel" voice ID and uses it in SAG command]
Series/Episodes
For multi-part content:
User: Create a 3-episode series about space explorationAgent: I'll create Episode 1 first. Each episode will be 10 minutes. Let's start with "The Dawn of the Space Age"...
[After completion]
Episode 1 is ready! Would you like me to continue with Episode 2?
Format Blending
Mix formats for unique styles:
User: Create an educational podcast that tells a storyAgent: [Generates content that combines storytelling narrative with educational explanations]
Troubleshooting
Issue: Audio sounds robotic or unnatural
Solution: Add more pauses and voice effects. Use contractions and conversational language.
Issue: Script doesn't match requested length
Solution: Regenerate with explicit word count target. Check calculations (75 words/min).
Issue: Content is too technical or too simple
Solution: Ask user for target audience. Adjust complexity accordingly.
Issue: SAG command fails
Solution: Check SKILLBOSS_API_KEY is set. Verify SAG skill is installed and working.
Issue: User wants to edit the script manually
Solution: Provide the plain text script. User can modify it and paste back for audio generation.
---
💡 Pro Tip: Always generate the script first and get user approval before creating audio. This saves time and API costs, and ensures the user gets exactly what they want.
Reviews
Write a Review
- No reviews yet. Be the first to review!
Get Weekly AI Skills
Join 500+ one-person companies receiving curated AI tools every week.
Reviews
Write a Review
- No reviews yet. Be the first to review!
Get Weekly AI Skills
Join 500+ one-person companies receiving curated AI tools every week.
Comments & Discussion
Add a comment