voirs-cli 0.1.0-alpha.3

Command-line interface for VoiRS speech synthesis
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
# voirs-cli

[![Crates.io](https://img.shields.io/crates/v/voirs-cli.svg)](https://crates.io/crates/voirs-cli)
[![Documentation](https://docs.rs/voirs-cli/badge.svg)](https://docs.rs/voirs-cli)

**Command-line interface for VoiRS speech synthesis framework.**

A powerful, user-friendly CLI tool for converting text to speech using the VoiRS framework. Features batch processing, real-time synthesis, voice management, and comprehensive output format support.

## Features

- **Text-to-Speech Synthesis**: Convert text files or direct input to high-quality audio
- **SSML Support**: Full Speech Synthesis Markup Language processing
- **Voice Management**: Download, list, and manage voices and models
- **Batch Processing**: Process multiple files efficiently with progress tracking
- **Real-time Synthesis**: Interactive mode with live audio playback
- **Multiple Formats**: Output to WAV, FLAC, MP3, Opus, and streaming audio
- **Quality Control**: Configurable quality settings and audio enhancement
- **Cross-platform**: Windows, macOS, and Linux support

## Installation

### Pre-built Binaries

Download the latest release for your platform from [GitHub Releases](https://github.com/cool-japan/voirs/releases).

### From Source

```bash
cargo install voirs-cli
```

### Package Managers

```bash
# Homebrew (macOS/Linux)
brew install voirs

# Scoop (Windows)
scoop install voirs

# Chocolatey (Windows)
choco install voirs
```

## Quick Start

```bash
# Basic text synthesis
voirs synth "Hello, world!" output.wav

# Use specific voice
voirs synth "Hello, world!" output.wav --voice en-US-female-calm

# SSML synthesis
voirs synth '<speak><emphasis level="strong">Hello</emphasis> world!</speak>' output.wav --ssml

# Interactive mode
voirs interactive

# List available voices
voirs voices list
```

## Commands

### `synth` - Text to Speech Synthesis

Convert text to speech audio.

```bash
voirs synth [OPTIONS] <TEXT> <OUTPUT>

# Examples
voirs synth "Hello world" hello.wav
voirs synth "Hello world" hello.wav --voice en-US-male-news
voirs synth "Bonjour le monde" bonjour.wav --voice fr-FR-female-casual
voirs synth "Hello world" hello.flac --quality high
voirs synth "Hello world" hello.mp3 --bitrate 320
```

#### Options

```bash
-v, --voice <VOICE>          Voice to use for synthesis [default: auto]
-q, --quality <QUALITY>      Synthesis quality [low|medium|high|ultra] [default: high]
-r, --sample-rate <RATE>     Output sample rate [default: 22050]
-f, --format <FORMAT>        Output format [wav|flac|mp3|opus] [default: auto]
-s, --ssml                   Input is SSML markup
    --speed <SPEED>          Speaking rate multiplier [default: 1.0]
    --pitch <PITCH>          Pitch shift in semitones [default: 0.0]
    --volume <VOLUME>        Volume adjustment in dB [default: 0.0]
    --enhance                Enable audio enhancement
    --no-normalize           Skip audio normalization
    --gpu                    Use GPU acceleration if available
    --streaming              Enable streaming synthesis for large texts
    --chunk-size <SIZE>      Chunk size for streaming [default: 256]
```

### `batch` - Batch Processing

Process multiple texts or files efficiently.

```bash
voirs batch [OPTIONS] <INPUT> <OUTPUT_DIR>

# Examples
voirs batch texts.txt ./audio/
voirs batch sentences.csv ./output/ --format flac
voirs batch book.txt ./chapters/ --split-sentences
```

#### Input Formats

```bash
# Text file (one sentence per line)
sentences.txt

# CSV file with columns: text,output_name,voice,speed
metadata.csv

# JSON file with array of synthesis requests
requests.json
```

#### Options

```bash
-f, --format <FORMAT>        Output format for all files
-v, --voice <VOICE>          Default voice for all texts
    --split-sentences        Split long texts into sentences
    --split-paragraphs       Split texts into paragraphs
    --max-length <LENGTH>    Maximum text length per file [default: 1000]
    --parallel <N>           Number of parallel synthesis jobs [default: 4]
    --resume                 Resume interrupted batch processing
    --progress               Show detailed progress information
```

### `interactive` - Interactive Mode

Start an interactive synthesis session.

```bash
voirs interactive [OPTIONS]

# Examples
voirs interactive
voirs interactive --voice en-US-female-calm --auto-play
```

#### Interactive Commands

```
> Hello, this is a test.                    # Synthesize text
> :voice en-GB-male-formal                  # Change voice
> :speed 1.2                                # Adjust speaking rate
> :pitch +0.5                               # Adjust pitch
> :quality ultra                            # Change quality
> :save last_synthesis.wav                  # Save last synthesis
> :play                                     # Replay last synthesis
> :ssml <speak><emphasis>Hello</emphasis></speak>  # SSML mode
> :help                                     # Show help
> :quit                                     # Exit
```

### `voices` - Voice Management

Manage available voices and models.

```bash
voirs voices <SUBCOMMAND>

# Subcommands
voirs voices list              # List available voices
voirs voices search <QUERY>    # Search for voices
voirs voices info <VOICE>      # Show voice details
voirs voices download <VOICE>  # Download voice model
voirs voices remove <VOICE>    # Remove voice model
voirs voices update            # Update voice database
```

#### Examples

```bash
# List all voices
voirs voices list

# List voices by language
voirs voices list --language en-US

# Search for female voices
voirs voices search female

# Get voice information
voirs voices info en-US-female-calm

# Download a voice
voirs voices download en-GB-male-formal

# Remove unused voices
voirs voices remove --unused
```

### `models` - Model Management

Manage synthesis models and backends.

```bash
voirs models <SUBCOMMAND>

# Subcommands
voirs models list              # List available models
voirs models info <MODEL>      # Show model details  
voirs models download <MODEL>  # Download model
voirs models remove <MODEL>    # Remove model
voirs models benchmark         # Benchmark models
voirs models optimize         # Optimize models for current hardware
```

#### Examples

```bash
# List installed models
voirs models list

# Download VITS model
voirs models download vits-en-us-female

# Benchmark all models
voirs models benchmark --output benchmark.json

# Optimize for current GPU
voirs models optimize --device cuda:0
```

### `config` - Configuration Management

Manage VoiRS configuration and preferences.

```bash
voirs config <SUBCOMMAND>

# Subcommands
voirs config show             # Show current configuration
voirs config set <KEY> <VALUE>  # Set configuration value
voirs config reset            # Reset to defaults
voirs config export <FILE>    # Export configuration
voirs config import <FILE>    # Import configuration
```

#### Examples

```bash
# Show configuration
voirs config show

# Set default voice
voirs config set default.voice en-US-female-calm

# Set output directory
voirs config set paths.output ~/Downloads/voirs/

# Reset configuration
voirs config reset --confirm

# Export settings
voirs config export my-settings.toml
```

### `server` - HTTP Server Mode

Start VoiRS as an HTTP API server.

```bash
voirs server [OPTIONS]

# Examples
voirs server --port 8080
voirs server --host 0.0.0.0 --port 3000 --workers 4
```

#### Options

```bash
-p, --port <PORT>           Port to listen on [default: 8080]
-h, --host <HOST>           Host to bind to [default: 127.0.0.1]
-w, --workers <N>           Number of worker threads [default: 4]
    --max-text-length <N>   Maximum text length [default: 5000]
    --rate-limit <N>        Requests per minute per IP [default: 60]
    --cors                  Enable CORS headers
    --api-key <KEY>         Require API key authentication
```

#### API Endpoints

```bash
POST /synthesize              # Synthesize text to audio
GET  /voices                  # List available voices
GET  /voices/{id}             # Get voice information
GET  /health                  # Health check
```

### `benchmark` - Performance Testing

Run performance benchmarks and quality tests.

```bash
voirs benchmark [OPTIONS]

# Examples
voirs benchmark --voices en-US-female-calm,en-GB-male-formal
voirs benchmark --output benchmark.json --detailed
```

#### Options

```bash
-v, --voices <VOICES>       Comma-separated list of voices to test
-o, --output <FILE>         Output results to file
    --detailed              Include detailed metrics
    --quality               Run quality tests (requires reference audio)
    --rtf                   Measure real-time factor
    --memory                Monitor memory usage
    --gpu-usage             Monitor GPU utilization
```

## Configuration

VoiRS uses a hierarchical configuration system with the following precedence:

1. Command-line arguments
2. Environment variables  
3. User configuration file (`~/.voirs/config.toml`)
4. System configuration file (`/etc/voirs/config.toml`)
5. Default values

### Configuration File

```toml
# ~/.voirs/config.toml

[default]
voice = "en-US-female-calm"
quality = "high"
sample_rate = 22050
format = "wav"

[paths]
models = "~/.voirs/models/"
cache = "~/.voirs/cache/"
output = "~/Downloads/"

[synthesis]
gpu_acceleration = true
streaming = false
chunk_size = 256
enhance_audio = true
normalize_output = true

[voices]
auto_download = true
preferred_languages = ["en-US", "en-GB"]
fallback_voice = "en-US-female-neutral"

[server]
host = "127.0.0.1"
port = 8080
workers = 4
max_text_length = 5000
rate_limit = 60

[batch]
parallel_jobs = 4
progress_reporting = true
resume_enabled = true
auto_split = true

[advanced]
backend = "candle"              # candle, onnx
device = "auto"                 # auto, cpu, cuda:0, metal
precision = "fp32"              # fp16, fp32
memory_limit = "4GB"
log_level = "info"              # error, warn, info, debug, trace
```

### Environment Variables

```bash
# Override configuration with environment variables
export VOIRS_DEFAULT_VOICE="en-US-male-news"
export VOIRS_SYNTHESIS_GPU_ACCELERATION="true"
export VOIRS_PATHS_MODELS="/custom/models/path"
export VOIRS_LOG_LEVEL="debug"
```

## Output Formats

### WAV (Uncompressed)
```bash
voirs synth "Hello" output.wav --sample-rate 44100 --bit-depth 24
```

### FLAC (Lossless Compression)
```bash
voirs synth "Hello" output.flac --compression-level 8
```

### MP3 (Lossy Compression)
```bash
voirs synth "Hello" output.mp3 --bitrate 320 --quality high
```

### Opus (Modern Codec)
```bash
voirs synth "Hello" output.opus --bitrate 128 --application audio
```

### Streaming Audio
```bash
# Stream to system audio output
voirs synth "Hello world" --play

# Stream to file while playing
voirs synth "Hello world" output.wav --play --streaming
```

## SSML Support

VoiRS supports Speech Synthesis Markup Language (SSML) for advanced speech control.

### Basic SSML

```bash
voirs synth '<speak>Hello <emphasis level="strong">world</emphasis>!</speak>' output.wav --ssml
```

### Advanced SSML Examples

```xml
<!-- Prosody control -->
<speak>
  <prosody rate="slow" pitch="low" volume="soft">
    This is spoken slowly, in a low pitch, and softly.
  </prosody>
</speak>

<!-- Pauses and breaks -->
<speak>
  Step 1. <break time="1s"/> Step 2. <break time="500ms"/> Step 3.
</speak>

<!-- Phonetic pronunciation -->
<speak>
  You say <phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme>,
  I say <phoneme alphabet="ipa" ph="təˈmɑːtoʊ">tomato</phoneme>.
</speak>

<!-- Voice selection -->
<speak>
  <voice name="en-US-female-calm">This is a calm female voice.</voice>
  <voice name="en-US-male-energetic">This is an energetic male voice!</voice>
</speak>

<!-- Language switching -->
<speak xml:lang="en-US">
  Hello! <span xml:lang="es-ES">¡Hola!</span> 
  <span xml:lang="fr-FR">Bonjour!</span>
</speak>
```

## Batch Processing

### Text File Input

```
# sentences.txt
Hello, this is the first sentence.
This is the second sentence.
And this is the third sentence.
```

```bash
voirs batch sentences.txt ./output/ --voice en-US-female-calm
```

### CSV Input with Metadata

```csv
text,output_name,voice,speed,pitch
"Hello world",hello,en-US-female-calm,1.0,0.0
"Bonjour le monde",bonjour,fr-FR-female-casual,1.1,0.5
"Hola mundo",hola,es-ES-male-news,0.9,-0.2
```

```bash
voirs batch metadata.csv ./output/ --format flac
```

### JSON Input with Full Control

```json
[
  {
    "text": "Hello, world!",
    "output": "hello.wav",
    "voice": "en-US-female-calm",
    "quality": "high",
    "ssml": false,
    "effects": {
      "speed": 1.0,
      "pitch": 0.0,
      "volume": 0.0
    }
  },
  {
    "text": "<speak><emphasis>Important</emphasis> announcement!</speak>",
    "output": "announcement.wav", 
    "voice": "en-US-male-formal",
    "quality": "ultra",
    "ssml": true
  }
]
```

```bash
voirs batch requests.json ./output/
```

## Performance Optimization

### GPU Acceleration

```bash
# Use GPU if available
voirs synth "Hello world" output.wav --gpu

# Specify GPU device
CUDA_VISIBLE_DEVICES=0 voirs synth "Hello world" output.wav --gpu

# Benchmark GPU performance
voirs benchmark --gpu-usage --voices en-US-female-calm
```

### Streaming for Long Texts

```bash
# Enable streaming for reduced latency
voirs synth "Very long text..." output.wav --streaming --chunk-size 512

# Interactive streaming
echo "Long text content" | voirs synth - output.wav --streaming
```

### Parallel Batch Processing

```bash
# Process with 8 parallel jobs
voirs batch large_dataset.txt ./output/ --parallel 8

# Monitor resource usage
voirs batch large_dataset.txt ./output/ --parallel 4 --progress
```

## Audio Quality Enhancement

### Basic Enhancement

```bash
voirs synth "Hello world" output.wav --enhance
```

### Advanced Audio Processing

```bash
# Custom quality settings
voirs synth "Hello world" output.wav \
  --quality ultra \
  --enhance \
  --volume +3.0 \
  --sample-rate 48000

# Professional audio settings
voirs synth "Hello world" broadcast.wav \
  --quality ultra \
  --enhance \
  --format wav \
  --sample-rate 48000 \
  --bit-depth 24 \
  --no-normalize  # Skip normalization for professional workflow
```

## Troubleshooting

### Common Issues

**Voice not found:**
```bash
# List available voices
voirs voices list

# Download missing voice
voirs voices download en-US-female-calm
```

**GPU not working:**
```bash
# Check GPU support
voirs config show | grep gpu

# Force CPU mode
voirs synth "Hello" output.wav --device cpu
```

**Poor audio quality:**
```bash
# Try higher quality settings
voirs synth "Hello" output.wav --quality ultra --enhance

# Check sample rate
voirs synth "Hello" output.wav --sample-rate 48000
```

**Memory issues:**
```bash
# Enable streaming for large texts
voirs synth "$(cat large_text.txt)" output.wav --streaming

# Reduce chunk size
voirs synth "$(cat large_text.txt)" output.wav --streaming --chunk-size 128
```

### Debug Mode

```bash
# Enable verbose logging
VOIRS_LOG_LEVEL=debug voirs synth "Hello" output.wav

# Save debug information
voirs synth "Hello" output.wav --debug --debug-output debug.json
```

### Performance Issues

```bash
# Profile synthesis performance
voirs benchmark --voices en-US-female-calm --detailed

# Check system resources
voirs benchmark --memory --gpu-usage

# Optimize models for your hardware
voirs models optimize --device auto
```

## Integration Examples

### Shell Scripts

```bash
#!/bin/bash
# text_to_speech.sh - Convert text files to audio

for file in *.txt; do
    echo "Processing $file..."
    voirs synth "$(cat "$file")" "${file%.txt}.wav" \
        --voice en-US-female-calm \
        --quality high \
        --progress
done
```

### Python Integration

```python
import subprocess
import json

def synthesize_text(text, output_file, voice="en-US-female-calm"):
    """Synthesize text using VoiRS CLI"""
    cmd = [
        "voirs", "synth", text, output_file,
        "--voice", voice,
        "--quality", "high"
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        raise RuntimeError(f"Synthesis failed: {result.stderr}")
    
    return output_file

# Usage
synthesize_text("Hello, world!", "greeting.wav")
```

### Web Integration

```javascript
// Node.js example using child_process
const { exec } = require('child_process');

function synthesizeText(text, outputFile) {
    return new Promise((resolve, reject) => {
        const cmd = `voirs synth "${text}" "${outputFile}" --quality high`;
        
        exec(cmd, (error, stdout, stderr) => {
            if (error) {
                reject(error);
            } else {
                resolve(outputFile);
            }
        });
    });
}

// Usage
synthesizeText("Hello from Node.js!", "greeting.wav")
    .then(file => console.log(`Audio saved to ${file}`))
    .catch(err => console.error(`Error: ${err.message}`));
```

## Contributing

We welcome contributions! Please see the [main repository](https://github.com/cool-japan/voirs) for contribution guidelines.

### Development Setup

```bash
git clone https://github.com/cool-japan/voirs.git
cd voirs/crates/voirs-cli

# Install development dependencies
cargo install cargo-nextest

# Run tests
cargo nextest run

# Run CLI locally
cargo run -- synth "Hello world" test.wav

# Build release version
cargo build --release
```

## License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]../../LICENSE-APACHE)
- MIT license ([LICENSE-MIT]../../LICENSE-MIT)

at your option.