Skip to main content

Crate voxcpm_rs

Crate voxcpm_rs 

Source
Expand description

§voxcpm-rs

Pure-Rust inference for VoxCPM2 built on top of the Burn ML framework. Supports Vulkan (via wgpu) and a CPU fallback through ndarray.

§Quick start

use voxcpm_rs::{GenerateOptions, Prompt, PromptAudio, VoxCPM};

type B = burn::backend::NdArray<f32>;
let device = Default::default();
let model: VoxCPM<B> = VoxCPM::from_local("./pretrained_models/VoxCPM2", &device).unwrap();

// Zero-shot:
let wav = model.generate("Hello, world!", GenerateOptions::default()).unwrap();

// Voice cloning from a reference wav:
let opts = GenerateOptions::builder()
    .timesteps(10)
    .prompt(Prompt::Reference { audio: "speaker.wav".into() })
    .build();
let wav = model.generate("Hello, world!", opts).unwrap();

voxcpm_rs::audio::write_wav("out.wav", &wav, model.sample_rate()).unwrap();

See the VoxCPM struct for the convenience API, or the individual submodules (minicpm4, locdit, locenc, audiovae) for low-level access.

Re-exports§

pub use audiovae::AudioVae;
pub use config::AudioVaeConfig;
pub use config::CfmConfig;
pub use config::LoraConfig;
pub use config::MiniCpm4Config;
pub use config::RopeScalingConfig;
pub use config::VoxCpm2Config;
pub use config::VoxCpmDitConfig;
pub use config::VoxCpmEncoderConfig;
pub use error::Error;
pub use error::Result;
pub use voxcpm2::CancelToken;
pub use voxcpm2::GenerateOptions;
pub use voxcpm2::GenerateOptionsBuilder;
pub use voxcpm2::GenerateStream;
pub use voxcpm2::Prompt;
pub use voxcpm2::PromptAudio;
pub use voxcpm2::VoxCPM;

Modules§

audio
audiovae
AudioVAE v2 decoder port (inference-only, non-streaming).
config
Configuration structs matching the JSON files shipped with VoxCPM2 checkpoints.
error
fsq
Scalar quantization layer used between the base LM and residual LM.
locdit
Local DiT v2 and Conditional Flow-Matching sampler.
locenc
Local encoder: a cls-pooled MiniCPM-4 over [B, T, P, D].
minicpm4
MiniCPM-4 transformer backbone used by the text-semantic LM, the residual acoustic LM, the local encoder, and the local DiT estimator.
tokenizer
Tokenizer wrapper around tokenizers::Tokenizer implementing the mask_multichar_chinese_tokens pre-processing behavior from the reference Python implementation.
voxcpm2
Top-level VoxCPM2 model and the high-level VoxCPM convenience wrapper.
weights
Pretrained weight loading for VoxCPM2.