Expand description
§voxcpm-rs
Pure-Rust inference for VoxCPM2 built on top
of the Burn ML framework. Supports Vulkan (via wgpu) and a CPU
fallback through ndarray.
§Quick start
use voxcpm_rs::{GenerateOptions, Prompt, PromptAudio, VoxCPM};
type B = burn::backend::NdArray<f32>;
let device = Default::default();
let model: VoxCPM<B> = VoxCPM::from_local("./pretrained_models/VoxCPM2", &device).unwrap();
// Zero-shot:
let wav = model.generate("Hello, world!", GenerateOptions::default()).unwrap();
// Voice cloning from a reference wav:
let opts = GenerateOptions::builder()
.timesteps(10)
.prompt(Prompt::Reference { audio: "speaker.wav".into() })
.build();
let wav = model.generate("Hello, world!", opts).unwrap();
voxcpm_rs::audio::write_wav("out.wav", &wav, model.sample_rate()).unwrap();See the VoxCPM struct for the convenience API, or the individual submodules
(minicpm4, locdit, locenc, audiovae) for low-level access.
Re-exports§
pub use audiovae::AudioVae;pub use config::AudioVaeConfig;pub use config::CfmConfig;pub use config::LoraConfig;pub use config::MiniCpm4Config;pub use config::RopeScalingConfig;pub use config::VoxCpm2Config;pub use config::VoxCpmDitConfig;pub use config::VoxCpmEncoderConfig;pub use error::Error;pub use error::Result;pub use voxcpm2::CancelToken;pub use voxcpm2::GenerateOptions;pub use voxcpm2::GenerateOptionsBuilder;pub use voxcpm2::GenerateStream;pub use voxcpm2::Prompt;pub use voxcpm2::PromptAudio;pub use voxcpm2::VoxCPM;
Modules§
- audio
- audiovae
- AudioVAE v2 decoder port (inference-only, non-streaming).
- config
- Configuration structs matching the JSON files shipped with VoxCPM2 checkpoints.
- error
- fsq
- Scalar quantization layer used between the base LM and residual LM.
- locdit
- Local DiT v2 and Conditional Flow-Matching sampler.
- locenc
- Local encoder: a cls-pooled MiniCPM-4 over
[B, T, P, D]. - minicpm4
- MiniCPM-4 transformer backbone used by the text-semantic LM, the residual acoustic LM, the local encoder, and the local DiT estimator.
- tokenizer
- Tokenizer wrapper around
tokenizers::Tokenizerimplementing themask_multichar_chinese_tokenspre-processing behavior from the reference Python implementation. - voxcpm2
- Top-level VoxCPM2 model and the high-level
VoxCPMconvenience wrapper. - weights
- Pretrained weight loading for VoxCPM2.