Open SourceStar on GitHub, add a model

The open-source TTS directory.

Every model, every voice, reading the same three scripts. Compare open-source TTS side-by-side without downloading anything.

22
models
42
voices
3
scripts each
The three scripts
Neutral
The quick brown fox jumps over the lazy dog near the riverbank.
Baseline naturalness — no emotion, no tricky words
Emotional
I can't believe you actually did it. This is incredible!
Prosody and excitement — tests expressiveness
Numbers & Dates
On March 14th, 2025, the team raised $4.2 million at a 38% margin.
Number, date, and symbol pronunciation — the common failure mode

22 models · 9 with samples

Bark

MIT

Generates laughter, sighs, and music — the chaotic one

1.5B·12GB·13 langs·cloning
Voice

ElevenLabs-tier quality, MIT licensed, voice cloning from a clip

500M·8GB·23 langs·cloning·streaming
VoiceDefault

CosyVoice 2

Apache-2.0

Alibaba's real-time TTS, 150ms streaming latency

500M·6GB·7 langs·cloning·streaming
VoiceDefault
NeutralEmotionalNumbers

Dia

Apache-2.0

Ultra-realistic two-speaker dialogue, sounds like a podcast

1.6B·10GB·1 lang·cloning·streaming
Voice

Flow-matching TTS, fast zero-shot voice cloning

335M·8GB·2 langs·cloning
VoiceReference clone

Fish Speech

Apache-2.0

Open-source TTS that benchmarks above closed-source

0.5B / 4B·12GB·8 langs·cloning·streaming
VoiceReference clone

Higgs Audio v2

Apache-2.0

Built on Llama 3.2 3B, 10M+ hours of training audio

3B·12GB·7 langs·cloning·streaming
VoiceDefault
NeutralEmotionalNumbers

IndexTTS 2

Apache-2.0

Bilibili's TTS with multi-speaker and precise emotion control

1B·8GB·2 langs·cloning·streaming
VoiceDefault
NeutralEmotionalNumbers

KittenTTS

Apache-2.0

15M params, under 25MB, runs anywhere — even your phone

15M·CPU OK·1 lang·streaming
VoiceDefault
NeutralEmotionalNumbers

Kokoro-82M

Apache-2.0

82M params, Apache 2.0, runs on a potato

82M·1GB·8 langs·streaming
Voice

MegaTTS 3

Apache-2.0

ByteDance's TTS with sparse alignment — robust prosody

450M·8GB·2 langs·cloning·streaming
VoiceDefault
NeutralEmotionalNumbers

Multilingual, real-time on CPU

52M·CPU OK·6 langs·streaming
VoiceDefault
NeutralEmotionalNumbers

Tone-color cloning + cross-lingual transfer

300M·4GB·6 langs·cloning
VoiceDefault
NeutralEmotionalNumbers

Orpheus TTS

Apache-2.0

Llama-3 based, empathetic, low-latency for interactive apps

3B·12GB·10 langs·cloning·streaming
Voice

Parler-TTS

Apache-2.0

Describe the voice in natural language ('soft female, fast, clear')

880M·6GB·1 lang·streaming
Voice

Tiny, fast, runs offline on a Raspberry Pi

20M·CPU OK·26 langs·streaming
VoiceDefault
NeutralEmotionalNumbers

Qwen3-TTS

Apache-2.0

Alibaba's flagship, 97ms latency, 10 languages

1.7B·8GB·10 langs·cloning·streaming
VoiceDefault
NeutralEmotionalNumbers

Spark-TTS

Apache-2.0

Built on Qwen2.5, zero-shot voice cloning + style control

500M·8GB·2 langs·cloning·streaming
VoiceDefault
NeutralEmotionalNumbers

Diffusion-based, human-level naturalness on LibriTTS

148M·4GB·1 lang·cloning
VoiceDefault
NeutralEmotionalNumbers

Microsoft's long-form TTS — 90 minutes, 4 speakers

1.5B·16GB·7 langs·cloning·streaming
VoiceDefault
NeutralEmotionalNumbers

Whisper, but inverted — TTS by 'unwrapping' OpenAI's ASR

300M·6GB·2 langs·cloning
VoiceDefault
NeutralEmotionalNumbers

XTTS v2

CPML (non-commercial)

Most-downloaded TTS on HF, 6-second voice cloning

750M·6GB·17 langs·cloning·streaming
VoiceReference clone

Missing a model? Add it.

OpenSpeech is community-maintained. Adding a model is a single PR: edit one JSON file, drop in three audio samples, open the pull request. Models must be genuinely open-source.