Crate voxcpm_rs

Expand description

§voxcpm-rs

Pure-Rust inference for VoxCPM2 built on top of the Burn ML framework. Supports Vulkan (via wgpu) and a CPU fallback through ndarray.

§Quick start

use voxcpm_rs::{GenerateOptions, Prompt, PromptAudio, VoxCPM};

type B = burn::backend::NdArray<f32>;
let device = Default::default();
let model: VoxCPM<B> = VoxCPM::from_local("./pretrained_models/VoxCPM2", &device).unwrap();

// Zero-shot:
let wav = model.generate("Hello, world!", GenerateOptions::default()).unwrap();

// Voice cloning from a reference wav:
let opts = GenerateOptions::builder()
    .timesteps(10)
    .prompt(Prompt::Reference { audio: "speaker.wav".into() })
    .build();
let wav = model.generate("Hello, world!", opts).unwrap();

voxcpm_rs::audio::write_wav("out.wav", &wav, model.sample_rate()).unwrap();

See the VoxCPM struct for the convenience API, or the individual submodules (minicpm4, locdit, locenc, audiovae) for low-level access.

Re-exports§

pub use audiovae::AudioVae;
pub use config::AudioVaeConfig;
pub use config::CfmConfig;
pub use config::LoraConfig;
pub use config::MiniCpm4Config;
pub use config::RopeScalingConfig;
pub use config::VoxCpm2Config;
pub use config::VoxCpmDitConfig;
pub use config::VoxCpmEncoderConfig;
pub use error::Error;
pub use error::Result;
pub use voxcpm2::CancelToken;
pub use voxcpm2::GenerateOptions;
pub use voxcpm2::GenerateOptionsBuilder;
pub use voxcpm2::GenerateStream;
pub use voxcpm2::Prompt;
pub use voxcpm2::PromptAudio;
pub use voxcpm2::VoxCPM;

Modules§

audio
audiovae: AudioVAE v2 decoder port (inference-only, non-streaming).
config: Configuration structs matching the JSON files shipped with VoxCPM2 checkpoints.
error
fsq: Scalar quantization layer used between the base LM and residual LM.
locdit: Local DiT v2 and Conditional Flow-Matching sampler.
locenc: Local encoder: a cls-pooled MiniCPM-4 over [B, T, P, D].
minicpm4: MiniCPM-4 transformer backbone used by the text-semantic LM, the residual acoustic LM, the local encoder, and the local DiT estimator.
tokenizer: Tokenizer wrapper around tokenizers::Tokenizer implementing the mask_multichar_chinese_tokens pre-processing behavior from the reference Python implementation.
voxcpm2: Top-level VoxCPM2 model and the high-level VoxCPM convenience wrapper.
weights: Pretrained weight loading for VoxCPM2.

Crate voxcpm_rs

Crate voxcpm_rs Copy item path

§voxcpm-rs

§Quick start

Re-exports§

Modules§

Crate voxcpm_rs