Crate pyin

source ·
Expand description

pYIN1 is a pitch (fundamental frequency) detection algorithm.

This crate provides a pitch estimate for each frame of the audio signal and a probability the frame is voiced region.

This crate provides a C language FFI and an executable binary. For the details of the executable binary, refer to the repo.

The implementation is based on librosa v0.9.1. For easy translation from Python + Numpy to Rust, the implementation is written on top of ndarray crate.

§Usage

use ndarray::prelude::*;
use pyin::{PYINExecutor, Framing, PadMode};

let fmin = 60f64;  // minimum frequency in Hz
let fmax = 600f64; // maximum frequency in Hz
let sr = 24000;    // sampling rate of audio data in Hz
let frame_length = 2048; // frame length in samples
let (win_length, hop_length, resolution) = (None, None, None);  // None to use default values
let mut pyin_exec = PYINExecutor::new(fmin, fmax, sr, frame_length, win_length, hop_length, resolution);

let wav: Vec<f64> = (0..24000).map(|i| (2. * std::f64::consts::PI * (i as f64) / 200.).sin()).collect();
let fill_unvoiced = f64::NAN;
let framing = Framing::Center(PadMode::Constant(0.));  // Zero-padding is applied on both sides of the signal. (only if cetner is true)

// timestamp (Array1<f64>) - contains the timestamp (in seconds) of each frames
// f0 (Array1<f64>) contains the pitch estimate in Hz. (NAN if unvoiced)
// voiced_flag (Array1<bool>) contains whether the frame is voiced or not.
// voiced_prob (Array1<f64>) contains the probability of the frame is voiced.
let (timestamp, f0, voiced_flag, voiced_prob) = pyin_exec.pyin(&wav, fill_unvoiced, framing);

  1. Mauch, Matthias, and Simon Dixon. “pYIN: A fundamental frequency estimator using probabilistic threshold distributions.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2014. 

Structs§

Enums§

Functions§

  • pyin
    C lang FFI for pYIN