TTS Installation
About 805 wordsAbout 3 min
2026-03-21
System dependencies, library installs, model downloads, and platform-specific notes for every major TTS library.
System Dependencies
Ubuntu / Debian
sudo apt update && sudo apt install -y \
ffmpeg \
libsndfile1 \
portaudio19-dev \
espeak-ng \
libespeak-ng-dev \
python3-dev \
build-essentialFedora / RHEL
sudo dnf install -y ffmpeg libsndfile portaudio-devel espeak-ng espeak-ng-devel gccmacOS
brew install ffmpeg libsndfile portaudio espeak-ngWindows
- Install ffmpeg and add to PATH
- Install eSpeak-NG and add to PATH
- PortAudio: included in the
sounddevicewheel — no manual install needed
Kokoro
Fastest CPU TTS. Recommended for production.
# PyTorch backend
pip install kokoro soundfile numpy
# ONNX backend (even faster CPU, no PyTorch)
pip install kokoro[onnx] soundfile numpy
# eSpeak-NG for phonemization (misaki backend, optional but recommended)
pip install misaki[en] # English
pip install misaki[ja] # Japanese
pip install misaki[zh] # Chinese
pip install misaki[ko] # Korean
pip install misaki[fr] # FrenchModels are downloaded automatically on first use to ~/.cache/huggingface/hub/.
Manual pre-download:
from huggingface_hub import snapshot_download
snapshot_download("hexgrad/Kokoro-82M", local_dir="./models/kokoro")Coqui XTTS-v2
# Basic install
pip install TTS
# Specific XTTS-v2 with GPU
pip install TTS torch torchaudio --index-url https://download.pytorch.org/whl/cu121Pre-download model (~2.8GB):
from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2")
# Downloads to ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/Manual download (for offline/Docker):
# Using huggingface-cli
pip install huggingface_hub
huggingface-cli download coqui/XTTS-v2 --local-dir ./models/xtts_v2F5-TTS
pip install f5-tts
# With GPU support
pip install f5-tts torch torchaudio --index-url https://download.pytorch.org/whl/cu121Pre-download model:
huggingface-cli download SWivid/F5-TTS --local-dir ./models/f5-ttsOr via Python:
from f5_tts.api import F5TTS
tts = F5TTS() # downloads ~1.2GB on first runBark
pip install bark
# suno-bark fork (more maintained)
pip install git+https://github.com/suno-ai/bark.git
# Dependencies
pip install transformers accelerate torch torchaudio soundfilePre-download all models (~5GB total):
from bark import preload_models
preload_models()
# Downloads to ~/.cache/suno/bark_v0/Models are stored separately:
~/.cache/suno/bark_v0/
├── text_2.pt # semantic model (~1.2GB)
├── coarse_2.pt # coarse acoustic (~1.2GB)
├── fine_2.pt # fine acoustic (~1.2GB)
└── hubert_base_ls960.pt # semantic encoderSmall models (lower quality, much faster):
import os
os.environ["SUNO_USE_SMALL_MODELS"] = "True"
from bark import preload_models
preload_models()edge-tts
pip install edge-tts
# No model download — uses Microsoft cloudOpenVoice V2
git clone https://github.com/myshell-ai/OpenVoice
cd OpenVoice
pip install -e .
pip install melo-tts
# Download checkpoints (~500MB)
python -c "
from huggingface_hub import snapshot_download
snapshot_download('myshell-ai/OpenVoiceV2', local_dir='./checkpoints_v2')
"StyleTTS2
pip install git+https://github.com/yl4579/StyleTTS2.git
pip install torch torchaudio phonemizer einops transformers
# Download model
huggingface-cli download yl4579/StyleTTS2-LibriTTS --local-dir ./models/styletts2pyttsx3
pip install pyttsx3
# Linux: requires espeak or espeak-ng
sudo apt install espeak-ng
# macOS: uses built-in NSSpeechSynthesizer (no extra deps)
# Windows: uses SAPI5 (no extra deps)sounddevice (for real-time audio output)
pip install sounddevice
# Requires portaudio (installed above)Minimal requirements.txt
# Core TTS stack
kokoro>=0.9.0
soundfile>=0.12.1
numpy>=1.24.0
sounddevice>=0.4.6
# For voice cloning (pick one)
# f5-tts>=0.3.0
# TTS>=0.22.0 # Coqui XTTS-v2
# For cloud TTS (no offline needed)
# edge-tts>=6.1.9
# For Bark
# bark>=1.0.0
# transformers>=4.35.0
# accelerate>=0.24.0Docker (CPU)
FROM python:3.11-slim
RUN apt-get update && apt-get install -y \
ffmpeg libsndfile1 espeak-ng portaudio19-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN pip install --no-cache-dir kokoro soundfile numpy sounddevice
# Pre-download models at build time
RUN python -c "from kokoro import KPipeline; KPipeline(lang_code='a')"
COPY . .
CMD ["python", "tts_service.py"]Docker (GPU — XTTS-v2 / F5-TTS)
FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y python3 python3-pip ffmpeg libsndfile1 espeak-ng \
&& rm -rf /var/lib/apt/lists/*
RUN pip3 install --no-cache-dir \
torch torchaudio --index-url https://download.pytorch.org/whl/cu121 \
TTS soundfile numpy
WORKDIR /app
COPY . .
CMD ["python3", "tts_api.py"]Platform Notes
| Platform | Notes |
|---|---|
| Raspberry Pi 4 | Kokoro ONNX only — too slow otherwise. Use int8 ONNX. Consider tiny voices. |
| Apple Silicon (M1/M2/M3) | Use device="mps" for PyTorch acceleration. Kokoro ONNX also works well. |
| Windows | pyttsx3 and edge-tts work natively. Kokoro requires WSL or proper Python env. |
| WSL2 | PortAudio may not access mic/speakers — route audio via PulseAudio or use file output only. |
| Headless server | Use file output only — no sounddevice streaming. Serve audio via API. |
Verify Installation
# Quick sanity check
from kokoro import KPipeline
import numpy as np
import soundfile as sf
pipe = KPipeline(lang_code="a")
chunks = [audio for _, _, audio in pipe("Installation successful.", voice="af_heart")]
sf.write("/tmp/test_tts.wav", np.concatenate(chunks), 24000)
print("TTS OK — check /tmp/test_tts.wav")See Also
- TTS Libraries Comparison
- TTS Implementation
- ASR Installation — Install ASR alongside TTS for a complete voice pipeline
- VAD Installation — Install VAD to capture the speech that drives the pipeline