# voirs-cli
[](https://crates.io/crates/voirs-cli)
[](https://docs.rs/voirs-cli)
**Command-line interface for VoiRS speech synthesis framework.**
A powerful, user-friendly CLI tool for converting text to speech using the VoiRS framework. Features batch processing, real-time synthesis, voice management, and comprehensive output format support.
## Features
- **Text-to-Speech Synthesis**: Convert text files or direct input to high-quality audio
- **SSML Support**: Full Speech Synthesis Markup Language processing
- **Voice Management**: Download, list, and manage voices and models
- **Batch Processing**: Process multiple files efficiently with progress tracking
- **Real-time Synthesis**: Interactive mode with live audio playback
- **Multiple Formats**: Output to WAV, FLAC, MP3, Opus, and streaming audio
- **Quality Control**: Configurable quality settings and audio enhancement
- **Cross-platform**: Windows, macOS, and Linux support
## Installation
### Pre-built Binaries
Download the latest release for your platform from [GitHub Releases](https://github.com/cool-japan/voirs/releases).
### From Source
```bash
cargo install voirs-cli
```
### Package Managers
```bash
# Homebrew (macOS/Linux)
brew install voirs
# Scoop (Windows)
scoop install voirs
# Chocolatey (Windows)
choco install voirs
```
## Quick Start
```bash
# Basic text synthesis
voirs synth "Hello, world!" output.wav
# Use specific voice
voirs synth "Hello, world!" output.wav --voice en-US-female-calm
# SSML synthesis
voirs synth '<speak><emphasis level="strong">Hello</emphasis> world!</speak>' output.wav --ssml
# Interactive mode
voirs interactive
# List available voices
voirs voices list
```
## Commands
### `synth` - Text to Speech Synthesis
Convert text to speech audio.
```bash
voirs synth [OPTIONS] <TEXT> <OUTPUT>
# Examples
voirs synth "Hello world" hello.wav
voirs synth "Hello world" hello.wav --voice en-US-male-news
voirs synth "Bonjour le monde" bonjour.wav --voice fr-FR-female-casual
voirs synth "Hello world" hello.flac --quality high
voirs synth "Hello world" hello.mp3 --bitrate 320
```
#### Options
```bash
-v, --voice <VOICE> Voice to use for synthesis [default: auto]
-q, --quality <QUALITY> Synthesis quality [low|medium|high|ultra] [default: high]
-r, --sample-rate <RATE> Output sample rate [default: 22050]
-f, --format <FORMAT> Output format [wav|flac|mp3|opus] [default: auto]
-s, --ssml Input is SSML markup
--speed <SPEED> Speaking rate multiplier [default: 1.0]
--pitch <PITCH> Pitch shift in semitones [default: 0.0]
--volume <VOLUME> Volume adjustment in dB [default: 0.0]
--enhance Enable audio enhancement
--no-normalize Skip audio normalization
--gpu Use GPU acceleration if available
--streaming Enable streaming synthesis for large texts
--chunk-size <SIZE> Chunk size for streaming [default: 256]
```
### `batch` - Batch Processing
Process multiple texts or files efficiently.
```bash
voirs batch [OPTIONS] <INPUT> <OUTPUT_DIR>
# Examples
voirs batch texts.txt ./audio/
voirs batch sentences.csv ./output/ --format flac
voirs batch book.txt ./chapters/ --split-sentences
```
#### Input Formats
```bash
# Text file (one sentence per line)
sentences.txt
# CSV file with columns: text,output_name,voice,speed
metadata.csv
# JSON file with array of synthesis requests
requests.json
```
#### Options
```bash
-f, --format <FORMAT> Output format for all files
-v, --voice <VOICE> Default voice for all texts
--split-sentences Split long texts into sentences
--split-paragraphs Split texts into paragraphs
--max-length <LENGTH> Maximum text length per file [default: 1000]
--parallel <N> Number of parallel synthesis jobs [default: 4]
--resume Resume interrupted batch processing
--progress Show detailed progress information
```
### `interactive` - Interactive Mode
Start an interactive synthesis session.
```bash
voirs interactive [OPTIONS]
# Examples
voirs interactive
voirs interactive --voice en-US-female-calm --auto-play
```
#### Interactive Commands
```
> Hello, this is a test. # Synthesize text
> :voice en-GB-male-formal # Change voice
> :speed 1.2 # Adjust speaking rate
> :pitch +0.5 # Adjust pitch
> :quality ultra # Change quality
> :save last_synthesis.wav # Save last synthesis
> :play # Replay last synthesis
> :ssml <speak><emphasis>Hello</emphasis></speak> # SSML mode
> :help # Show help
> :quit # Exit
```
### `voices` - Voice Management
Manage available voices and models.
```bash
voirs voices <SUBCOMMAND>
# Subcommands
voirs voices list # List available voices
voirs voices search <QUERY> # Search for voices
voirs voices info <VOICE> # Show voice details
voirs voices download <VOICE> # Download voice model
voirs voices remove <VOICE> # Remove voice model
voirs voices update # Update voice database
```
#### Examples
```bash
# List all voices
voirs voices list
# List voices by language
voirs voices list --language en-US
# Search for female voices
voirs voices search female
# Get voice information
voirs voices info en-US-female-calm
# Download a voice
voirs voices download en-GB-male-formal
# Remove unused voices
voirs voices remove --unused
```
### `models` - Model Management
Manage synthesis models and backends.
```bash
voirs models <SUBCOMMAND>
# Subcommands
voirs models list # List available models
voirs models info <MODEL> # Show model details
voirs models download <MODEL> # Download model
voirs models remove <MODEL> # Remove model
voirs models benchmark # Benchmark models
voirs models optimize # Optimize models for current hardware
```
#### Examples
```bash
# List installed models
voirs models list
# Download VITS model
voirs models download vits-en-us-female
# Benchmark all models
voirs models benchmark --output benchmark.json
# Optimize for current GPU
voirs models optimize --device cuda:0
```
### `config` - Configuration Management
Manage VoiRS configuration and preferences.
```bash
voirs config <SUBCOMMAND>
# Subcommands
voirs config show # Show current configuration
voirs config set <KEY> <VALUE> # Set configuration value
voirs config reset # Reset to defaults
voirs config export <FILE> # Export configuration
voirs config import <FILE> # Import configuration
```
#### Examples
```bash
# Show configuration
voirs config show
# Set default voice
voirs config set default.voice en-US-female-calm
# Set output directory
voirs config set paths.output ~/Downloads/voirs/
# Reset configuration
voirs config reset --confirm
# Export settings
voirs config export my-settings.toml
```
### `server` - HTTP Server Mode
Start VoiRS as an HTTP API server.
```bash
voirs server [OPTIONS]
# Examples
voirs server --port 8080
voirs server --host 0.0.0.0 --port 3000 --workers 4
```
#### Options
```bash
-p, --port <PORT> Port to listen on [default: 8080]
-h, --host <HOST> Host to bind to [default: 127.0.0.1]
-w, --workers <N> Number of worker threads [default: 4]
--max-text-length <N> Maximum text length [default: 5000]
--rate-limit <N> Requests per minute per IP [default: 60]
--cors Enable CORS headers
--api-key <KEY> Require API key authentication
```
#### API Endpoints
```bash
POST /synthesize # Synthesize text to audio
GET /voices # List available voices
GET /voices/{id} # Get voice information
GET /health # Health check
```
### `benchmark` - Performance Testing
Run performance benchmarks and quality tests.
```bash
voirs benchmark [OPTIONS]
# Examples
voirs benchmark --voices en-US-female-calm,en-GB-male-formal
voirs benchmark --output benchmark.json --detailed
```
#### Options
```bash
-v, --voices <VOICES> Comma-separated list of voices to test
-o, --output <FILE> Output results to file
--detailed Include detailed metrics
--quality Run quality tests (requires reference audio)
--rtf Measure real-time factor
--memory Monitor memory usage
--gpu-usage Monitor GPU utilization
```
## Configuration
VoiRS uses a hierarchical configuration system with the following precedence:
1. Command-line arguments
2. Environment variables
3. User configuration file (`~/.voirs/config.toml`)
4. System configuration file (`/etc/voirs/config.toml`)
5. Default values
### Configuration File
```toml
# ~/.voirs/config.toml
[default]
voice = "en-US-female-calm"
quality = "high"
sample_rate = 22050
format = "wav"
[paths]
models = "~/.voirs/models/"
cache = "~/.voirs/cache/"
output = "~/Downloads/"
[synthesis]
gpu_acceleration = true
streaming = false
chunk_size = 256
enhance_audio = true
normalize_output = true
[voices]
auto_download = true
preferred_languages = ["en-US", "en-GB"]
fallback_voice = "en-US-female-neutral"
[server]
host = "127.0.0.1"
port = 8080
workers = 4
max_text_length = 5000
rate_limit = 60
[batch]
parallel_jobs = 4
progress_reporting = true
resume_enabled = true
auto_split = true
[advanced]
backend = "candle" # candle, onnx
device = "auto" # auto, cpu, cuda:0, metal
precision = "fp32" # fp16, fp32
memory_limit = "4GB"
log_level = "info" # error, warn, info, debug, trace
```
### Environment Variables
```bash
# Override configuration with environment variables
export VOIRS_DEFAULT_VOICE="en-US-male-news"
export VOIRS_SYNTHESIS_GPU_ACCELERATION="true"
export VOIRS_PATHS_MODELS="/custom/models/path"
export VOIRS_LOG_LEVEL="debug"
```
## Output Formats
### WAV (Uncompressed)
```bash
voirs synth "Hello" output.wav --sample-rate 44100 --bit-depth 24
```
### FLAC (Lossless Compression)
```bash
voirs synth "Hello" output.flac --compression-level 8
```
### MP3 (Lossy Compression)
```bash
voirs synth "Hello" output.mp3 --bitrate 320 --quality high
```
### Opus (Modern Codec)
```bash
voirs synth "Hello" output.opus --bitrate 128 --application audio
```
### Streaming Audio
```bash
# Stream to system audio output
voirs synth "Hello world" --play
# Stream to file while playing
voirs synth "Hello world" output.wav --play --streaming
```
## SSML Support
VoiRS supports Speech Synthesis Markup Language (SSML) for advanced speech control.
### Basic SSML
```bash
voirs synth '<speak>Hello <emphasis level="strong">world</emphasis>!</speak>' output.wav --ssml
```
### Advanced SSML Examples
```xml
<speak>
<prosody rate="slow" pitch="low" volume="soft">
This is spoken slowly, in a low pitch, and softly.
</prosody>
</speak>
<speak>
Step 1. <break time="1s"/> Step 2. <break time="500ms"/> Step 3.
</speak>
<speak>
You say <phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme>,
I say <phoneme alphabet="ipa" ph="təˈmɑːtoʊ">tomato</phoneme>.
</speak>
<speak>
<voice name="en-US-female-calm">This is a calm female voice.</voice>
<voice name="en-US-male-energetic">This is an energetic male voice!</voice>
</speak>
<speak xml:lang="en-US">
Hello! <span xml:lang="es-ES">¡Hola!</span>
<span xml:lang="fr-FR">Bonjour!</span>
</speak>
```
## Batch Processing
### Text File Input
```
# sentences.txt
Hello, this is the first sentence.
This is the second sentence.
And this is the third sentence.
```
```bash
voirs batch sentences.txt ./output/ --voice en-US-female-calm
```
### CSV Input with Metadata
```csv
text,output_name,voice,speed,pitch
"Hello world",hello,en-US-female-calm,1.0,0.0
"Bonjour le monde",bonjour,fr-FR-female-casual,1.1,0.5
"Hola mundo",hola,es-ES-male-news,0.9,-0.2
```
```bash
voirs batch metadata.csv ./output/ --format flac
```
### JSON Input with Full Control
```json
[
{
"text": "Hello, world!",
"output": "hello.wav",
"voice": "en-US-female-calm",
"quality": "high",
"ssml": false,
"effects": {
"speed": 1.0,
"pitch": 0.0,
"volume": 0.0
}
},
{
"text": "<speak><emphasis>Important</emphasis> announcement!</speak>",
"output": "announcement.wav",
"voice": "en-US-male-formal",
"quality": "ultra",
"ssml": true
}
]
```
```bash
voirs batch requests.json ./output/
```
## Performance Optimization
### GPU Acceleration
```bash
# Use GPU if available
voirs synth "Hello world" output.wav --gpu
# Specify GPU device
CUDA_VISIBLE_DEVICES=0 voirs synth "Hello world" output.wav --gpu
# Benchmark GPU performance
voirs benchmark --gpu-usage --voices en-US-female-calm
```
### Streaming for Long Texts
```bash
# Enable streaming for reduced latency
voirs synth "Very long text..." output.wav --streaming --chunk-size 512
# Interactive streaming
### Parallel Batch Processing
```bash
# Process with 8 parallel jobs
voirs batch large_dataset.txt ./output/ --parallel 8
# Monitor resource usage
voirs batch large_dataset.txt ./output/ --parallel 4 --progress
```
## Audio Quality Enhancement
### Basic Enhancement
```bash
voirs synth "Hello world" output.wav --enhance
```
### Advanced Audio Processing
```bash
# Custom quality settings
voirs synth "Hello world" output.wav \
--quality ultra \
--enhance \
--volume +3.0 \
--sample-rate 48000
# Professional audio settings
voirs synth "Hello world" broadcast.wav \
--quality ultra \
--enhance \
--format wav \
--sample-rate 48000 \
--bit-depth 24 \
--no-normalize # Skip normalization for professional workflow
```
## Troubleshooting
### Common Issues
**Voice not found:**
```bash
# List available voices
voirs voices list
# Download missing voice
voirs voices download en-US-female-calm
```
**GPU not working:**
```bash
# Check GPU support
# Force CPU mode
voirs synth "Hello" output.wav --device cpu
```
**Poor audio quality:**
```bash
# Try higher quality settings
voirs synth "Hello" output.wav --quality ultra --enhance
# Check sample rate
voirs synth "Hello" output.wav --sample-rate 48000
```
**Memory issues:**
```bash
# Enable streaming for large texts
voirs synth "$(cat large_text.txt)" output.wav --streaming
# Reduce chunk size
voirs synth "$(cat large_text.txt)" output.wav --streaming --chunk-size 128
```
### Debug Mode
```bash
# Enable verbose logging
VOIRS_LOG_LEVEL=debug voirs synth "Hello" output.wav
# Save debug information
voirs synth "Hello" output.wav --debug --debug-output debug.json
```
### Performance Issues
```bash
# Profile synthesis performance
voirs benchmark --voices en-US-female-calm --detailed
# Check system resources
voirs benchmark --memory --gpu-usage
# Optimize models for your hardware
voirs models optimize --device auto
```
## Integration Examples
### Shell Scripts
```bash
#!/bin/bash
# text_to_speech.sh - Convert text files to audio
for file in *.txt; do
echo "Processing $file..."
voirs synth "$(cat "$file")" "${file%.txt}.wav" \
--voice en-US-female-calm \
--quality high \
--progress
done
```
### Python Integration
```python
import subprocess
import json
def synthesize_text(text, output_file, voice="en-US-female-calm"):
"""Synthesize text using VoiRS CLI"""
cmd = [
"voirs", "synth", text, output_file,
"--voice", voice,
"--quality", "high"
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f"Synthesis failed: {result.stderr}")
return output_file
# Usage
synthesize_text("Hello, world!", "greeting.wav")
```
### Web Integration
```javascript
// Node.js example using child_process
const { exec } = require('child_process');
function synthesizeText(text, outputFile) {
return new Promise((resolve, reject) => {
const cmd = `voirs synth "${text}" "${outputFile}" --quality high`;
exec(cmd, (error, stdout, stderr) => {
if (error) {
reject(error);
} else {
resolve(outputFile);
}
});
});
}
// Usage
synthesizeText("Hello from Node.js!", "greeting.wav")
.then(file => console.log(`Audio saved to ${file}`))
.catch(err => console.error(`Error: ${err.message}`));
```
## Contributing
We welcome contributions! Please see the [main repository](https://github.com/cool-japan/voirs) for contribution guidelines.
### Development Setup
```bash
git clone https://github.com/cool-japan/voirs.git
cd voirs/crates/voirs-cli
# Install development dependencies
cargo install cargo-nextest
# Run tests
cargo nextest run
# Run CLI locally
cargo run -- synth "Hello world" test.wav
# Build release version
cargo build --release
```
## License
Licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE](../../LICENSE-APACHE))
- MIT license ([LICENSE-MIT](../../LICENSE-MIT))
at your option.