← All Skills
AI Skill

audio-gen

Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and SkillBoss TTS converts ...

0.0 (0 reviews) 0 stars 📦 0 installs 👁 0 views
Quick Install
npx skills add audio-gen
AI Skill

audio-gen

Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and SkillBoss TTS converts ...

0.0 (0 reviews) 0 stars 📦 0 installs 👁 0 views
Quick Install
npx skills add audio-gen
AI Skill

audio-gen

Generate audiobooks, podcasts, or educational audio content on demand. User provides an idea or topic, Claude AI writes a script, and SkillBoss TTS converts it to high-quality audio. Supports multiple formats (audiobook, podcast, educational), custom lengths, and voice effects. Use when asked to create audio content, make a podcast, generate an audiobook, or produce educational audio. Returns MP3 audio file via MEDIA token.

0 stars
📦 0 installs
👁️ 0 views

🎙️ Audio Content Generator

Generate high-quality audiobooks, podcasts, or educational audio content on demand using AI-written scripts and SkillBoss TTS text-to-speech.

Quick Start

Create an audiobook chapter:

User: "Create a 5-minute audiobook chapter about a dragon discovering friendship"

Generate a podcast:

User: "Make a 10-minute podcast about the history of coffee"

Produce educational content:

User: "Generate a 15-minute educational audio explaining how neural networks work"

Content Formats

Audiobook

Style: Narrative storytelling with emotional depth

  • Clear beginning, middle, and end
  • Descriptive language and vivid imagery
  • Dramatic pacing with thoughtful pauses
  • Emotional tone that matches the story
  • Use voice effects like [whispers], [excited], [serious] for impact
  • Example Structure:

    [Opening hook - set the scene]
    

    [long pause]

    [Story development with character emotions]

    [short pause] between sentences

    [long pause] between paragraphs

    [Climax with dramatic tension]

    [long pause]

    [Resolution and emotional closure]

    Podcast

    Style: Conversational and engaging

  • Warm, welcoming intro (15-30 seconds)
  • Main content with natural flow
  • Transitions between topics
  • Memorable outro with key takeaways
  • Conversational tone throughout
  • Example Structure:

    Intro: "Welcome to [topic]. I'm excited to share..."
    

    [short pause]

    Main Content: "Let's start with... [topic 1]"

    [long pause] between segments

    Outro: "Thanks for listening! Remember..."

    Educational Content

    Style: Clear explanations for learning

  • Simple introductions to complex topics
  • Step-by-step breakdowns
  • Real-world examples and analogies
  • Recap of key concepts at the end
  • Enthusiastic delivery with [excited] for important points
  • Example Structure:

    Introduction: What is [topic] and why it matters?

    Main Content:

  • Concept 1: Explanation + Example
  • Concept 2: Explanation + Example
  • Concept 3: Explanation + Example
  • Summary: Key takeaways and next steps

    Length Guidelines

    Word Count to Duration Conversion:

  • 5 minutes = ~375 words
  • 10 minutes = ~750 words
  • 15 minutes = ~1,125 words
  • 20 minutes = ~1,500 words
  • 30 minutes = ~2,250 words
  • Pacing: Average conversational speed is ~75 words per minute

    Practical Limits:

  • Minimum: 2 minutes (~150 words)
  • Maximum: 30 minutes (~2,250 words)
  • Sweet spot: 5-15 minutes for best engagement
  • Workflow Instructions

    Step 1: Understand the Request

    Parse the user's request for:

    1. Content type (audiobook, podcast, educational, or inferred from topic)

    2. Topic/theme (what should the content be about)

    3. Target length (how many minutes)

    4. Tone/style (dramatic, casual, educational, etc.)

    5. Special requests (specific voice, emphasis on certain points)

    Step 2: Calculate Word Count

    target_words = target_minutes × 75
    

    Example: 10 minutes = 10 × 75 = 750 words

    Step 3: Generate the Script

    Write the complete script following these rules:

    Content Guidelines:

  • Start strong with an engaging hook
  • Maintain natural, conversational flow
  • Use active voice and simple sentence structure
  • Include relevant examples and stories
  • End with a satisfying conclusion
  • Formatting Rules:

  • Add [short pause] after sentences (use sparingly, not every sentence)
  • Add [long pause] between paragraphs or major sections
  • Use voice effects strategically: [whispers], [shouts], [excited], [serious], [sarcastic], [sings], [laughs]
  • Write numbers as words: "twenty-three" not "23"
  • Spell out acronyms first time: "AI, or artificial intelligence"
  • Avoid complex punctuation (em-dashes work, but semicolons don't read well)
  • Remove markdown formatting before TTS conversion
  • Step 4: Present the Script

    Show the script to the user and ask:

    Here's the [format] script I've created (approximately [length] minutes):

    [Display the script]

    Would you like me to:

    1. Generate the audio now

    2. Make changes to the script

    3. Adjust the length or tone

    Step 5: Handle User Feedback

    If user requests changes:

  • Regenerate the script with adjustments
  • Maintain the target word count
  • Present the revised version
  • If user approves:

  • Proceed to audio generation
  • Step 6: Generate Audio

    Format the script for TTS:

    1. Remove any remaining markdown (headers, bold, italics)

    2. Ensure voice effects are in proper [effect] format

    3. Check that pauses are appropriately placed

    4. Verify numbers and acronyms are spelled out

    Invoke the TTS script:

    IMPORTANT: Uses SkillBoss AI gateway for TTS (minimax/speech-01-turbo). The SKILLBOSS_API_KEY is configured in ~/.openclaw/.env.

    curl -s -X POST "https://api.heybossai.com/v1/audio/speech" \
    

    -H "Authorization: Bearer $SKILLBOSS_API_KEY" \

    -H "Content-Type: application/json" \

    -d '{

    "model": "minimax/speech-01-turbo",

    "input": "[formatted_script]",

    "voice": "alloy"

    }' \

    --output /tmp/audio-gen-[timestamp]-[topic-slug].mp3

    For long scripts, use heredoc:

    SCRIPT="$(cat <<'EOF'
    

    [formatted_script]

    EOF

    )"

    curl -s -X POST "https://api.heybossai.com/v1/audio/speech" \

    -H "Authorization: Bearer $SKILLBOSS_API_KEY" \

    -H "Content-Type: application/json" \

    -d "$(jq -n --arg input "$SCRIPT" '{model: "minimax/speech-01-turbo", input: $input, voice: "alloy"}')" \

    --output /tmp/audio-gen-[timestamp]-[topic-slug].mp3

    Return the result:

    MEDIA:/tmp/audio-gen-[timestamp]-[topic-slug].mp3

    Your [format] is ready! [Brief description of content]. Duration: approximately [X] minutes.

    Voice Effects (SSML Tags)

    Available voice modulation effects (use sparingly for impact):

  • [whispers] - Soft, intimate delivery
  • [shouts] - Loud, emphatic delivery
  • [excited] - Enthusiastic, energetic tone
  • [serious] - Grave, solemn tone
  • [sarcastic] - Ironic, mocking tone
  • [sings] - Musical, melodic delivery
  • [laughs] - Amused, jovial tone
  • [short pause] - Brief silence (~0.5s)
  • [long pause] - Extended silence (~1-2s)
  • Best Practices:

  • Use effects for emotional moments, not every sentence
  • Pauses are your most powerful tool for pacing
  • Voice effects work best in audiobooks and dramatic content
  • Keep podcasts and educational content mostly natural
  • Error Handling

    Script Too Long

    If the generated script exceeds target by >20%:

    The script I generated is [X] words ([Y] minutes), which is longer than your target of [Z] minutes. Would you like me to:
    

    1. Condense it to fit the target length

    2. Split it into multiple parts

    3. Keep it as is

    Script Too Short

    If the generated script is under target by >20%:

    The script is [X] words ([Y] minutes), shorter than your target. Would you like me to:
    

    1. Expand it with more detail

    2. Add additional examples or stories

    3. Generate as is

    TTS Generation Fails

    If the TTS script fails:

    I've created the script, but I'm unable to generate the audio right now. Here's your script:

    [Display script]

    Error: [specific error message]

    You can:

    1. Check that SKILLBOSS_API_KEY is configured

    2. Use the script with your own text-to-speech tool

    3. Try again in a moment

    4. Ask me to troubleshoot the audio generation

    Common TTS Issues:

  • API key not set: Verify SKILLBOSS_API_KEY in config
  • Rate limit: Wait a moment and try again
  • Text too long: Break into smaller chunks (max ~5000 characters)
  • Invalid Request

    For unrealistic requests (e.g., "100-hour audiobook"):

    That length would require [X] words and take significant time to generate. I recommend:
    
  • Breaking it into multiple episodes/chapters
  • Targeting 5-30 minutes per audio file
  • Creating a series instead of one long file
  • Tips for Best Results

    For Engaging Audiobooks

  • Focus on character emotions and sensory details
  • Use pauses to build dramatic tension
  • Vary sentence length for rhythm
  • Include internal monologue and reflection
  • For Compelling Podcasts

  • Start with a question or surprising fact
  • Use conversational phrases: "You know what's interesting..."
  • Include relatable examples from everyday life
  • End with actionable takeaways
  • For Effective Educational Content

  • Use the "explain like I'm five" approach
  • Build from simple to complex concepts
  • Repeat key terms and definitions
  • Provide multiple examples for clarity
  • Technical Notes

    TTS Implementation:

  • Uses Python script: ~/.clawdbot/clawdbot/skills/sag/scripts/tts.py
  • No binary installation required (pure Python + requests)
  • Directly calls SkillBoss TTS API
  • Compatible with Linux and macOS
  • File Storage:

  • Audio files are saved to /tmp/audio-gen/
  • Filename format: audio-gen-[timestamp]-[topic-slug].mp3
  • Files are automatically cleaned up after 24 hours
  • API Requirements:

  • SkillBoss AI gateway for both script generation and TTS
  • Single SKILLBOSS_API_KEY handles everything (configured in ~/.openclaw/.env)
  • No individual vendor API keys needed
  • Supported Models (via SkillBoss):

  • minimax/speech-01-turbo - Fast, high quality TTS (default)
  • Cost Estimate:

  • 10-minute audio (~750 words): uses SkillBoss credits
  • All costs billed through single SkillBoss credit system
  • Generation Time:

  • Script generation: 5-30 seconds (depending on length)
  • Audio generation: 5-15 seconds (SkillBoss TTS processing)
  • Total: Usually under 1 minute for 10-minute audio
  • Limitations

    1. Maximum Length: 30 minutes (~2,250 words) per audio file

    - For longer content, create multiple parts/episodes

    2. Single Voice: Currently supports one narrator voice

    - Cannot do multi-voice dialogue or character voices

    3. No Background Music: Pure voice narration only

    - No background music, sound effects, or audio mixing

    4. Real-time Generation: Each request generates fresh content

    - No pre-made templates or cached audio

    5. Language: Primarily English

    - SkillBoss TTS supports other languages, but content generation optimized for English

    Example Conversations

    Example 1: Quick Audiobook

    User: Create a 5-minute audiobook chapter about a lonely robot learning to feel emotions

    Agent: I'll create a 5-minute sci-fi audiobook chapter. Let me write the story...

    [Generates ~400-word narrative with emotional arc]

    Here's the story I've created about Unit-7, a maintenance robot who discovers the meaning of friendship through an unexpected encounter. Would you like me to generate the audio or make changes first?

    User: Generate the audio

    Agent: [Invokes SAG with formatted script]

    MEDIA:/tmp/audio-gen-1234567890-lonely-robot.mp3

    Your audiobook chapter is ready! The story follows Unit-7's emotional awakening through a touching encounter with a young engineer. Duration: approximately 5 minutes.

    Example 2: Podcast with Revisions

    User: Make a 10-minute podcast about sustainable living tips

    Agent: [Generates podcast script with intro, 5 practical tips, outro]

    Here's a 10-minute podcast with 5 actionable sustainable living tips. The tone is encouraging and practical. Would you like me to:

    1. Generate audio now

    2. Make it more urgent/climate-focused

    3. Add more specific examples

    4. Adjust the length

    User: Add more specific examples for each tip

    Agent: [Regenerates with detailed examples]

    Better? I've added specific examples like calculating your carbon footprint, composting at home, and choosing sustainable brands. Ready for audio?

    User: Perfect!

    Agent: MEDIA:/tmp/audio-gen-1234567890-sustainable-living.mp3

    Your sustainable living podcast is ready with 5 practical tips, each with concrete examples you can start using today! Duration: 10 minutes.

    Example 3: Educational Content

    User: Create a 15-minute educational audio explaining quantum computing for beginners

    Agent: I'll create a beginner-friendly 15-minute explanation of quantum computing...

    [Generates ~1,100 words with analogies, examples, and clear structure]

    Script preview:

  • Introduction: What is quantum computing? (2 min)
  • Classical vs Quantum: The key differences (4 min)
  • Qubits and Superposition: How it works (5 min)
  • Real Applications: What it means for you (3 min)
  • Conclusion: The future of computing (1 min)
  • Should I generate the audio?

    User: Yes please

    Agent: MEDIA:/tmp/audio-gen-1234567890-quantum-computing.mp3

    Your educational podcast on quantum computing is ready! It uses everyday analogies to explain qubits, superposition, and real-world applications. Duration: approximately 15 minutes.

    When to Use This Skill

    Use the audio-gen skill when users request:

  • "Create an audiobook about..."
  • "Make a podcast episode about..."
  • "Generate audio content explaining..."
  • "Record a story about..."
  • "Produce an audio guide for..."
  • "Create a narrated version of..."
  • "Make an audio lesson about..."
  • Do NOT use for:

  • Transcribing existing audio (use whisper skills instead)
  • Converting user-written text to audio (use SAG skill directly)
  • Real-time voice interaction (use voice-call plugin)
  • Multi-speaker dialogues or interviews (current limitation)

Advanced Usage

Voice Selection

Users can request specific voices:

User: Create a podcast with the Rachel voice

Agent: [Checks voice configuration for "Rachel" voice ID and uses it in SAG command]

Series/Episodes

For multi-part content:

User: Create a 3-episode series about space exploration

Agent: I'll create Episode 1 first. Each episode will be 10 minutes. Let's start with "The Dawn of the Space Age"...

[After completion]

Episode 1 is ready! Would you like me to continue with Episode 2?

Format Blending

Mix formats for unique styles:

User: Create an educational podcast that tells a story

Agent: [Generates content that combines storytelling narrative with educational explanations]

Troubleshooting

Issue: Audio sounds robotic or unnatural

Solution: Add more pauses and voice effects. Use contractions and conversational language.

Issue: Script doesn't match requested length

Solution: Regenerate with explicit word count target. Check calculations (75 words/min).

Issue: Content is too technical or too simple

Solution: Ask user for target audience. Adjust complexity accordingly.

Issue: SAG command fails

Solution: Check SKILLBOSS_API_KEY is set. Verify SAG skill is installed and working.

Issue: User wants to edit the script manually

Solution: Provide the plain text script. User can modify it and paste back for audio generation.

---

💡 Pro Tip: Always generate the script first and get user approval before creating audio. This saves time and API costs, and ensures the user gets exactly what they want.

Comments & Discussion

Add a comment

Reviews

0.0
0 reviews

Write a Review

  • No reviews yet. Be the first to review!

Get Weekly AI Skills

Join 500+ one-person companies receiving curated AI tools every week.

Reviews

0.0
0 reviews

Write a Review

  • No reviews yet. Be the first to review!

Get Weekly AI Skills

Join 500+ one-person companies receiving curated AI tools every week.