
Voicebox
Open source voice cloning powered by Qwen3-TTS. Create natural-sounding speech from text with near-perfect voice replication.





What is Voicebox?
Voicebox is a local-first voice cloning studio with DAW-like features for professional voice synthesis. Think of it as a local, free and open-source alternative to ElevenLabs — download models, clone voices, and generate speech entirely on your machine.
Unlike cloud services that lock your voice data behind subscriptions, Voicebox gives you complete privacy, professional tools, and native performance. Download a voice model, clone any voice from a few seconds of audio, and compose multi-voice projects with studio-grade editing tools.
Optimized for performance with Metal acceleration on Mac and CUDA acceleration on Windows/Linux for fast, local inference.
No Python install required.
See it in action...
Near-Perfect Voice Cloning
Powered by Alibaba's Qwen3-TTS model for exceptional voice quality and accuracy.
Stories Editor
Create multi-voice narratives with a timeline-based editor. Arrange tracks, trim clips, and mix conversations.
Multi-Sample Support
Combine multiple voice samples for higher quality and more natural-sounding results.
Local or Remote
Run GPU inference locally or connect to a remote machine. One-click server setup.
Audio Transcription
Powered by Whisper for accurate speech-to-text. Extract reference text from voice samples automatically.
Cross-Platform
Available for macOS, Windows, and Linux. No Python installation required.