qwen-tts — One Person Company

coding

★★★★☆ 4.0/5.0 ❤️ 729 likes 💬 116 comments 📦 764 installs

📖 SKILL DOCUMENTATION

# qwen-tts

Qwen TTS Text-to-speech via SkillBoss API Hub (cloud) or local Qwen3-TTS model (offline). Quick Start Generate speech from text (cloud via SkillBoss API Hub):

export SKILLBOSS_API_KEY=your_key

scripts/tts.py "Ciao, come va?" -l Italian -o output.wav Local mode (no API key needed, requires setup first): scripts/tts.py "Ciao, come va?" -l Italian -o output.wav With voice instruction (emotion/style): scripts/tts.py "Sono felice!" -i "Parla con entusiasmo" -l Italian -o happy.wav Different speaker: scripts/tts.py "Hello world" -s Ryan -l English -o hello.wav Installation For cloud mode (SkillBoss API Hub): Set SKILLBOSS_API_KEY and use --remote flag. For local mode (one-time setup): cd skills/public/qwen-tts bash scripts/setup.sh This creates a local virtual environment and installs qwen-tts package (~500MB).

Note: First synthesis downloads ~1.7GB model from Hugging Face automatically.

Usage scripts/tts.py [options] "Text to speak" Options -o, --output PATH - Output file path (default: qwen_output.wav) -s, --speaker NAME - Speaker voice (default: Vivian) -l, --language LANG - Language (default: Auto) -i, --instruct TEXT - Voice instruction (emotion, style, tone) --remote - Use SkillBoss API Hub for cloud TTS --list-speakers - Show available speakers --model NAME - Model name (local mode only, default: CustomVoice 1.7B) Examples Basic Italian speech (cloud): scripts/tts.py "Benvenuto nel futuro del text-to-speech" -l Italian -o welcome.wav --remote With emotion/instruction: scripts/tts.py "Sono molto felice di vederti!" -i "Parla con entusiasmo e gioia" -l Italian -o happy.wav Different speaker: scripts/tts.py "Hello, nice to meet you" -s Ryan -l English -o ryan.wav List available speakers: scripts/tts.py --list-speakers Available Speakers The CustomVoice model includes 9 premium voices: SpeakerLanguageDescriptionVivianChineseBright, slightly edgy young femaleSerenaChineseWarm, gentle young femaleUncle_FuChineseSeasoned male, low mellow timbreDylanChinese (Beijing)Youthful Beijing male, clearEricChinese (Sichuan)Lively Chengdu male, huskyRyanEnglishDynamic male, rhythmicAidenEnglishSunny American maleOno_AnnaJapanesePlayful female, light nimbleSoheeKoreanWarm female, rich emotion

Recommendation: Use each speaker's native language for best quality, though all speakers support all 10 languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian).

Voice Instructions Use -i, --instruct to control emotion, tone, and style: Italian examples: "Parla con entusiasmo" "Tono serio e professionale" "Voce calma e rilassante" "Leggi come un narratore" English examples: "Speak with excitement" "Very happy and energetic" "Calm and soothing voice" "Read like a narrator" SkillBoss API Hub Integration Cloud TTS is powered by SkillBoss API Hub (https://api.heybossai.com/v1/pilot). Set SKILLBOSS_API_KEY and use --remote to route through the hub automatically.

export SKILLBOSS_API_KEY=your_key

scripts/tts.py "Ciao" -s Vivian -l Italian -o /tmp/audio.wav --remote

# OUTPUT = /tmp/audio.wav

Performance Cloud (SkillBoss API Hub): ~1-3 seconds, no local GPU needed GPU (CUDA): ~1-3 seconds for short phrases

CPU: ~10-30 seconds for short phrases

Model size: ~1.7GB (auto-downloads on first local run) Venv size: ~500MB (installed dependencies, local mode only) Troubleshooting Setup fails (local mode):

# Ensure Python 3.10-3.12 is available

python3.12 --version

# Re-run setup

cd skills/public/qwen-tts rm -rf venv bash scripts/setup.sh Model download slow/fails (local mode):

# Use mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com

scripts/tts.py "Test" -o test.wav Out of memory (GPU): The model automatically falls back to CPU if GPU memory insufficient. Audio quality issues: Try different speaker: --list-speakers Add instruction: -i "Speak clearly and slowly" Check language matches text: -l Italian for Italian text Model Details

Model: Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Source: Hugging Face (https://huggingface.co/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)
License: Check model card for current license terms

Sample Rate: 16kHz Output Format: WAV (uncompressed)

Reviews

Write a Review

Reviews

Write a Review

Get Weekly AI Skills