docs.rs failed to build pmetal-vocoder-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
pmetal-vocoder
BigVGAN neural vocoder for text-to-speech synthesis.
Overview
This crate provides a Rust implementation of BigVGAN, a high-fidelity neural vocoder that converts mel spectrograms to audio waveforms. It's designed for integration with speech synthesis systems.
Architecture
BigVGAN uses an anti-aliased multi-periodicity architecture:
Mel Spectrogram → Conv1d → [AMP Blocks × N] → Conv1d → Audio Waveform
↓
Snake Activations
Anti-aliased Upsampling
Multi-period Processing
Features
- High Fidelity: 24kHz+ audio generation
- 256× Upsampling: Mel frames to audio samples
- Snake Activations: Learnable periodic activations
- Anti-Aliased Upsampling: Reduced aliasing artifacts
- Multi-Resolution Discriminators: MPD, CQT-D for training
Components
Generator
- Anti-Aliased Multi-Periodicity (AMP) blocks
- Snake/SnakeBeta activations for periodic signals
- Configurable upsampling ratios
Discriminators (for training)
- MPD: Multi-Period Discriminator
- MSD: Multi-Scale Discriminator
- CQT-D: Constant-Q Transform Discriminator
Audio Processing
- STFT/iSTFT for spectral analysis
- Mel filterbank computation
- Resampling utilities
Usage
use ;
// Load vocoder
let config = default;
let vocoder = new?;
// Convert mel spectrogram to audio
let mel = /* mel spectrogram [batch, mel_bins, frames] */;
let audio = vocoder.forward?;
// audio: [batch, 1, samples]
Configuration
| Parameter | Description | Default |
|---|---|---|
upsample_rates |
Upsampling factors | [8, 8, 2, 2] |
upsample_kernel_sizes |
Kernel sizes | [16, 16, 4, 4] |
resblock_kernel_sizes |
ResBlock kernels | [3, 7, 11] |
resblock_dilation_sizes |
Dilation rates | [[1,3,5], ...] |
Modules
| Module | Description |
|---|---|
generator |
BigVGAN generator architecture |
discriminator |
Training discriminators |
nn |
Custom neural network layers |
audio |
Audio processing utilities |
loss |
Training loss functions |
License
MIT OR Apache-2.0