Which open-source TTS should I try?

Start with the three picks below. Hear them. Pick one. We’ll get into the rest later.

Start here

Picks updated by community votes

🥇

Best overall

Kokoro-82M

The current default recommendation.

Excellent quality. Tiny model. Runs almost anywhere.

Voice

🥈

Runner-up

Orpheus TTS

Most natural-sounding conversations.

Llama-3 based. Empathetic. Built for interactive apps.

Voice

🥉

Third place

Chatterbox

Best voice cloning, MIT-licensed.

ElevenLabs-tier quality from a few seconds of reference audio.

VoiceDefault

Side-by-side

Hear the leaders back-to-back

Same three scripts, three top models, one screen. The fastest way to decide what you actually want.

Kokoro-82MOrpheus TTSChatterbox

Compare these three

The three scripts (every voice reads these)

Neutral

“The quick brown fox jumps over the lazy dog near the riverbank.”

Baseline naturalness — no emotion, no tricky words

Emotional

“I can't believe you actually did it. This is incredible!”

Prosody and excitement — tests expressiveness

Numbers & Dates

“On March 14th, 2025, the team raised $4.2 million at a 38% margin.”

Number, date, and symbol pronunciation — the common failure mode

Browse by need

Not sure what you need yet?

Each shelf below is hand-curated. Scroll horizontally to see all picks. Add anything to the comparison bar to hear them side-by-side.

Best for voice cloning

Clone a voice from a short reference clip.

Chatterbox

Default

ElevenLabs-tier quality, MIT licensed, voice cloning from a clip

CommercialCloningStreaming500M· 8GB

XTTS v2

Reference clone

Most-downloaded TTS on HF, 6-second voice cloning

CloningStreaming750M· 6GB

Fish Speech

Reference clone

Open-source TTS that benchmarks above closed-source

CommercialCloningStreaming0.5B / 4B· 12GB

Bark

Generates laughter, sighs, and music — the chaotic one

CommercialCloning1.5B· 12GB

F5-TTS

Reference clone

Flow-matching TTS, fast zero-shot voice cloning

CommercialCloning335M· 8GB

Dia

Ultra-realistic two-speaker dialogue, sounds like a podcast

CommercialCloningStreaming1.6B· 10GB

Orpheus TTS

Llama-3 based, empathetic, low-latency for interactive apps

CommercialCloningStreaming3B· 12GB

Best for real-time

Stream tokens fast enough for voice agents.

Kokoro-82M

82M params, Apache 2.0, runs on a potato

CommercialStreaming82M· 1GB

Parler-TTS

Describe the voice in natural language ('soft female, fast, clear')

CommercialStreaming880M· 6GB

Fish Speech

Reference clone

Open-source TTS that benchmarks above closed-source

CommercialCloningStreaming0.5B / 4B· 12GB