Voicebox

Open source voice cloning powered by Qwen3-TTS. Create natural-sounding speech from text with near-perfect voice replication.

macOS (ARM)

macOS (Intel)

Windows

Linux

View on GitHub

What is Voicebox?

Voicebox is a local-first voice cloning studio with DAW-like features for professional voice synthesis. Think of it as a local, free and open-source alternative to ElevenLabs — download models, clone voices, and generate speech entirely on your machine.

Unlike cloud services that lock your voice data behind subscriptions, Voicebox gives you complete privacy, professional tools, and native performance. Download a voice model, clone any voice from a few seconds of audio, and compose multi-voice projects with studio-grade editing tools.

Optimized for performance with Metal acceleration on Mac and CUDA acceleration on Windows/Linux for fast, local inference.

No Python install required.

See it in action...

Near-Perfect Voice Cloning

Stories Editor

Create multi-voice narratives with a timeline-based editor. Arrange tracks, trim clips, and mix conversations.

Multi-Sample Support

Combine multiple voice samples for higher quality and more natural-sounding results.

Local or Remote

Run GPU inference locally or connect to a remote machine. One-click server setup.

Audio Transcription

Cross-Platform

Available for macOS, Windows, and Linux. No Python installation required.