Chatterbox
Resemble AI's open-source (MIT) text-to-speech and zero-shot voice cloning model with emotion control, 23+ languages, and a watermark on every output.
Updated 2026-06-27
Overview
Chatterbox is Resemble AI's open-source text-to-speech and zero-shot voice cloning model, released under the permissive MIT license. Feed it a short reference clip and a script, and it generates speech in the target voice — no per-speaker training run required. It sits in the same neighborhood as ElevenLabs and Zonos2 but takes the opposite commercial posture: the weights are free to download, self-host, and use commercially with no royalties or usage caps.
What sets it apart from most open TTS projects is polish and traction. The model supports emotion and intensity control rather than flat monotone read-outs, spans 23+ languages via its Multilingual line, and ships recent Turbo and Multilingual V3 releases for faster or broader-coverage inference. Its GitHub repo has drawn roughly 25k stars and the Hugging Face model has passed a million downloads — numbers that make it one of the most-adopted open TTS systems available, which matters because community momentum drives fine-tunes, integrations, and bug fixes.
The detail worth flagging: every output carries Resemble's imperceptible neural watermark (their PerTh-style approach), so synthetic audio stays detectable downstream. That's a deliberate safety stance baked into an open-weights release — unusual, and a reason it's defensible to ship as the default rather than a stripped-down clone. Resemble AI also runs a paid hosted platform at app.resemble.ai for teams that want managed inference and enterprise support instead of standing up their own GPUs.
Key features
Zero-shot voice cloning
Clones a voice from a short reference sample without a dedicated training run, so you can spin up new speakers on demand instead of waiting on per-voice model training.
Emotion & intensity control
Exposes controls for emotional tone and intensity rather than producing flat, neutral narration — useful for character work, audiobooks, and expressive product voices.
Multilingual coverage
The Multilingual line supports 23+ languages, with recent V3 and Turbo releases improving coverage and inference speed for production use.
Built-in neural watermark
Every generated clip carries an imperceptible watermark so AI-synthesized speech remains detectable — a safety measure shipped by default in an open-weights model, which is rare.
Pricing
Free tier: The complete Chatterbox model is free under the MIT license — self-host it, use it commercially, no usage caps. You only pay if you opt into Resemble AI's hosted platform.
| Plan | Price | What's included |
|---|---|---|
| Open-source model (MIT) | Free | Full model weights free to download and self-host. Commercial use permitted with no royalties or usage caps. Requires your own GPU/infrastructure. |
| Hosted platform | Check website for current pricing | Managed inference via app.resemble.ai for teams that prefer not to self-host. Plans and enterprise options listed on the Resemble AI site. |
Full model weights free to download and self-host. Commercial use permitted with no royalties or usage caps. Requires your own GPU/infrastructure.
Managed inference via app.resemble.ai for teams that prefer not to self-host. Plans and enterprise options listed on the Resemble AI site.
Pros & cons
Pros
- ✓Fully open-source under MIT — commercial use, self-hosting, no royalties or per-character caps
- ✓Zero-shot voice cloning from a short sample, no per-voice training step
- ✓Emotion and intensity controls go beyond flat, monotone TTS
- ✓Imperceptible watermark on every output keeps synthetic audio detectable
- ✓Massive adoption (~25k GitHub stars, 1M+ HF downloads) means active maintenance and integrations
Cons
- ×Self-hosting needs your own GPU and technical setup — there's no polished consumer app for the free model
- ×Quality and naturalness vary across the 23+ languages; English is the strongest
- ×Hosted-platform pricing is separate and not transparently listed alongside the open model
- ×Open voice cloning raises real misuse risk; the watermark mitigates but doesn't prevent it
How it compares
| Tool | Best for | Pricing | Score |
|---|---|---|---|
| Chatterbox | — | Free MIT open-source model + paid Resemble AI hosted platform | 8.8/10 |
| Suno AI vs Suno AI → | — | Freemium | 9.2/10 |
| ElevenLabs vs ElevenLabs → | — | Free tier + Starter $5/mo + Creator $22/mo + Pro $99/mo + Scale $330/mo + Enterprise custom | 9.2/10 |
| Udio vs Udio → | — | Freemium | 8.8/10 |
Compare head-to-head
Related reading
Gov and Big Labs Get Frontier AI First. We Don't.
The most capable AI now reaches a few labs and the government first, then trickles to the public in guardrailed form. Why that precedent matters.
GPT-5.6 Sol, Terra, Luna: OpenAI's Limited Preview
OpenAI's GPT-5.6 family ships as a gated preview after a US government request. Here's what Sol, Terra, and Luna actually are.
GPT-5.6 Sol vs GPT-5.5: What the Benchmarks Say
OpenAI previewed GPT-5.6 Sol on June 26. Here's how its reported benchmarks stack up against GPT-5.5 — and what to trust.
Ready to try Chatterbox?
Head to the official site to start with Chatterbox — pricing and plans are listed above.
Visit Chatterbox

