VAD Installation & Setup
About 826 wordsAbout 3 min
2026-03-21
Complete installation instructions for every major VAD library, including dependencies, platform notes, and environment setup.
Prerequisites
System Dependencies
# Ubuntu / Debian
sudo apt update
sudo apt install -y python3-dev python3-pip ffmpeg portaudio19-dev
# Fedora / RHEL
sudo dnf install -y python3-devel ffmpeg portaudio-devel
# macOS (Homebrew)
brew install ffmpeg portaudio
# Windows (Chocolatey)
choco install ffmpegPython Environment (Recommended)
Always use a virtual environment to avoid dependency conflicts:
# Create and activate a virtual environment
python3 -m venv vad-env
source vad-env/bin/activate # Linux/macOS
# vad-env\Scripts\activate # Windows
# Upgrade pip
pip install --upgrade pip1. webrtcvad
pip install webrtcvadVerify Install
import webrtcvad
vad = webrtcvad.Vad()
print("webrtcvad OK, version info:", webrtcvad.__version__ if hasattr(webrtcvad, '__version__') else "installed")Troubleshooting
# If build fails on Linux: missing C compiler
sudo apt install -y build-essential python3-dev
# If build fails on Windows: install Visual C++ Build Tools
# https://visualstudio.microsoft.com/visual-cpp-build-tools/
# Alternative pre-built wheel
pip install webrtcvad-wheels # community pre-built wheels2. silero-vad
Option A: pip install (Recommended)
pip install silero-vadThis installs the model package directly. No internet needed at inference time.
from silero_vad import load_silero_vad
model = load_silero_vad()
print("silero-vad ready")Option B: PyTorch Hub (requires PyTorch)
pip install torch torchaudioimport torch
model, utils = torch.hub.load(
repo_or_dir="snakers4/silero-vad",
model="silero_vad",
force_reload=False,
onnx=False
)Option C: ONNX Runtime (no PyTorch)
For environments where PyTorch is too heavy:
pip install onnxruntime silero-vadfrom silero_vad import load_silero_vad
model = load_silero_vad(onnx=True) # uses onnxruntime instead of torchVerify silero-vad
from silero_vad import load_silero_vad, read_audio
import torch
model = load_silero_vad()
# Create a 1-second silent audio tensor
wav = torch.zeros(16000)
model.reset_states()
prob = model(wav[:512], 16000).item()
print(f"Speech probability on silence: {prob:.4f}") # should be near 03. pyannote.audio
Install
pip install pyannote.audioHuggingFace Token Setup
pyannote models require accepting user agreements:
- Create account at huggingface.co
- Accept the license for the models you need:
- Generate a token at hf.co/settings/tokens
Using Your Token
# Option A: Environment variable (recommended)
export HUGGINGFACE_TOKEN="hf_xxxYourTokenHerexxx"
# Option B: huggingface-cli login
pip install huggingface_hub
huggingface-cli loginimport os
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/voice-activity-detection",
use_auth_token=os.environ["HUGGINGFACE_TOKEN"]
)GPU Setup (Optional but Recommended)
# Install PyTorch with CUDA support (CUDA 12.1 example)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install pyannote.audioimport torch
pipeline.to(torch.device("cuda"))4. auditok
pip install auditok
# For MP3/OGG support (requires ffmpeg system package):
pip install "auditok[extras]"Verify
python3 -c "import auditok; print('auditok', auditok.__version__)"5. SpeechBrain
pip install speechbrain
# Download VAD model (first use only)
python3 -c "
from speechbrain.pretrained import VAD
vad = VAD.from_hparams(
source='speechbrain/vad-crdnn-libriparty',
savedir='pretrained_models/vad'
)
print('SpeechBrain VAD ready')
"6. Audio I/O Libraries
PyAudio (for microphone input)
# Linux
sudo apt install portaudio19-dev
pip install pyaudio
# macOS
brew install portaudio
pip install pyaudio
# Windows
pip install pyaudio # pre-built wheels available on Windowssounddevice (alternative to PyAudio)
pip install sounddevice soundfilesoundfile (for reading/writing audio files)
pip install soundfilenumpy and scipy (signal processing)
pip install numpy scipyComplete Environment: All-in-One
For a full VAD development environment:
# Create environment
python3 -m venv vad-env
source vad-env/bin/activate
# Core audio
pip install numpy scipy soundfile sounddevice
# VAD libraries
pip install webrtcvad silero-vad auditok
# Heavy (optional — choose what you need)
pip install torch torchaudio # required for silero-vad torch mode
pip install pyannote.audio # requires HF token
pip install speechbrain # full toolkit
# Utilities
pip install librosa matplotlib jupyterrequirements.txt (minimal)
numpy>=1.21.0
scipy>=1.7.0
soundfile>=0.10.3
sounddevice>=0.4.4
webrtcvad>=2.0.10
silero-vad>=4.0.0requirements.txt (full)
numpy>=1.21.0
scipy>=1.7.0
soundfile>=0.10.3
sounddevice>=0.4.4
torch>=2.0.0
torchaudio>=2.0.0
webrtcvad>=2.0.10
silero-vad>=4.0.0
pyannote.audio>=3.1.0
speechbrain>=0.5.15
librosa>=0.10.0
auditok>=0.2.0Docker Setup
FROM python:3.11-slim
# System deps
RUN apt-get update && apt-get install -y \
ffmpeg \
portaudio19-dev \
build-essential \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python3", "main.py"]Platform-Specific Notes
| Platform | Notes |
|---|---|
| Raspberry Pi (arm64) | webrtcvad builds from source, silero-vad ONNX mode recommended |
| macOS Apple Silicon | Use pip install torch (MPS backend available for silero) |
| Windows | Use webrtcvad-wheels to avoid build issues |
| WSL2 | Microphone access requires Windows audio passthrough — use sounddevice |
| Docker (no GPU) | Use silero-vad ONNX or webrtcvad |
| Docker (NVIDIA GPU) | Use nvidia/cuda base image for pyannote/speechbrain |
See Also
- VAD Libraries Comparison
- VAD Implementation
- Real-Time Streaming VAD
- VAD Cheatsheet
- ASR Installation — Install ASR alongside VAD for a complete pipeline
- TTS Installation — Install TTS for the response side of the voice assistant