loqa-voice-dsp 0.2.0

Shared DSP library for voice analysis (pitch, formants, spectral features)

Coverage
68.24%
58 out of 85 items documented8 out of 28 items with examples
Size
Source code size: 201.5 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 6.48 MB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 35s Average build duration of successful builds.
all releases: 39s Average build duration of successful builds in releases after 2024-10-23.
Links
Homepage
Repository
crates.io
Dependencies
Versions
Owners

loqa-voice-dsp

Shared DSP library for voice analysis, providing core digital signal processing functionality for both Loqa backend and Voiceline mobile app.

Features

Pitch Detection: YIN algorithm for fundamental frequency (F0) estimation
Formant Extraction: Linear Predictive Coding (LPC) for formant analysis
FFT Utilities: Fast Fourier Transform for spectral analysis
Spectral Analysis: Spectral centroid, tilt, and rolloff calculations
HNR (Harmonics-to-Noise Ratio): Breathiness measurement using Boersma's autocorrelation method
H1-H2 Amplitude Difference: Vocal weight analysis (lighter vs fuller voice quality)

Installation

iOS (CocoaPods)

Add to your Podfile:

pod 'LoqaVoiceDSP', '~> 0.1.0'

Then run:

pod install

iOS (Swift Package Manager)

In Xcode:

File → Add Packages
Enter repository URL: https://github.com/loqalabs/loqa
Select version: 0.1.0 or later

Or add to Package.swift:

dependencies: [
    .package(url: "https://github.com/loqalabs/loqa", from: "0.1.0")
]

Rust (Cargo)

Add to your Cargo.toml:

[dependencies]
loqa-voice-dsp = "0.1.0"

Usage

Rust (Loqa Backend)

use loqa_voice_dsp::{detect_pitch, extract_formants, compute_fft, calculate_hnr, calculate_h1h2};

let audio_samples: Vec<f32> = /* your audio data */;
let sample_rate = 16000;

// Pitch detection
let pitch = detect_pitch(&audio_samples, sample_rate, 80.0, 400.0)?;
println!("Frequency: {} Hz, Confidence: {}", pitch.frequency, pitch.confidence);

// Formant extraction
let formants = extract_formants(&audio_samples, sample_rate, 14)?;
println!("F1: {} Hz, F2: {} Hz", formants.f1, formants.f2);

// HNR (breathiness)
let hnr = calculate_hnr(&audio_samples, sample_rate, 75.0, 500.0)?;
println!("HNR: {} dB, Voiced: {}", hnr.hnr, hnr.is_voiced);

// H1-H2 (vocal weight)
let h1h2 = calculate_h1h2(&audio_samples, sample_rate, Some(pitch.frequency))?;
println!("H1-H2: {} dB", h1h2.h1h2);

// FFT
let fft_result = compute_fft(&audio_samples, sample_rate, 2048)?;

iOS (Swift via FFI)

// Call C-compatible FFI functions
let samples: [Float] = /* your audio data */

// Pitch detection
let pitchResult = samples.withUnsafeBufferPointer { buffer in
    loqa_detect_pitch(
        buffer.baseAddress!,
        buffer.count,
        16000,  // sample rate
        80.0,   // min freq
        400.0   // max freq
    )
}
if pitchResult.success {
    print("Pitch: \(pitchResult.frequency)Hz, Confidence: \(pitchResult.confidence)")
}

// HNR (breathiness)
let hnrResult = samples.withUnsafeBufferPointer { buffer in
    loqa_calculate_hnr(
        buffer.baseAddress!,
        buffer.count,
        16000,  // sample rate
        75.0,   // min freq
        500.0   // max freq
    )
}
if hnrResult.success {
    print("HNR: \(hnrResult.hnr) dB, Voiced: \(hnrResult.is_voiced)")
}

// H1-H2 (vocal weight) - pass 0.0 for f0 to auto-detect
let h1h2Result = samples.withUnsafeBufferPointer { buffer in
    loqa_calculate_h1h2(
        buffer.baseAddress!,
        buffer.count,
        16000,  // sample rate
        pitchResult.frequency  // use detected pitch, or 0.0 to auto-detect
    )
}
if h1h2Result.success {
    print("H1-H2: \(h1h2Result.h1h2) dB")
}

Android (Java via JNI)

// Build with --features android-jni
import com.voiceline.VoicelineDSP;

float[] audioSamples = /* your audio data */;
VoicelineDSP.PitchResult pitch = VoicelineDSP.detectPitch(
    audioSamples,
    16000,  // sample rate
    80.0f,  // min freq
    400.0f  // max freq
);

System.out.println("Frequency: " + pitch.frequency + " Hz");

Note: Android JNI requires building with --features android-jni

Implementation Status

Crate structure created
Pitch detection (YIN + autocorrelation)
Formant extraction (LPC-based)
FFT utilities
Spectral analysis (centroid, tilt, rolloff)
HNR calculation (Boersma's autocorrelation method)
H1-H2 amplitude difference (vocal weight)
iOS FFI layer (C exports for all functions)
Android JNI layer (with jni feature)
Unit tests (68 passing)
FFI integration tests (9 passing)
Voice sample validation tests (30 passing)
Documentation tests (8 passing)
Benchmarks harness
Performance benchmarks (validated)

Performance Benchmarks

Validated Performance (2025-11-07) - All targets exceeded ✅

Operation	Target	Actual (mean)	Result	Speedup
Pitch detection (100ms audio)	<20ms	0.125ms	✅ PASS	160x faster
Formant extraction (500ms audio)	<50ms	0.134ms	✅ PASS	373x faster
FFT (2048 points)	<10ms	~0.020ms	✅ PASS	500x faster
Spectral analysis	<5ms	~0.003ms	✅ PASS	1667x faster
HNR calculation (100ms window)	<30ms	<1ms	✅ PASS	>30x faster
H1-H2 with F0 provided	<20ms	<1ms	✅ PASS	>20x faster

Note: Benchmarks run on Apple M-series silicon. All latency targets easily met with significant performance headroom for real-time voice processing.

Acoustic Measures Reference

HNR (Harmonics-to-Noise Ratio)

Measures the ratio of harmonic (periodic) to noise (aperiodic) energy in voice - the primary acoustic indicator of breathiness.

HNR Range	Interpretation
18-25+ dB	Clear, less breathy voice
12-18 dB	Moderate breathiness
<10 dB	Very breathy or pathological voice

H1-H2 (First/Second Harmonic Difference)

Measures the amplitude difference between the fundamental and second harmonic - indicates vocal weight.

H1-H2 Range	Interpretation
>5 dB	Lighter, breathier vocal quality
0-5 dB	Balanced vocal weight
<0 dB	Fuller, heavier vocal quality

Test Data

Saarbrücken Voice Database

This library uses samples from the Saarbrücken Voice Database for consistency validation testing.

License: CC BY 4.0

Attribution: Pützer, M. & Barry, W.J., Former Institute of Phonetics, Saarland University. Available at Zenodo.

The SVD provides lab-quality voice recordings including:

Sustained vowels (/a:/, /i:/, /u:/) at low, normal, and high pitch
851 healthy control speakers
1002 speakers with documented voice pathologies
50 kHz sample rate, controlled recording conditions

Setting Up Test Data

# 1. Download SVD from Zenodo (CC BY 4.0 license)
#    https://zenodo.org/records/16874898

# 2. Install conversion dependencies
pip install scipy numpy

# 3. Convert SVD files to test format
python scripts/download_svd.py /path/to/extracted/svd

Test Sample Requirements

For comprehensive validation, the library needs test samples with these characteristics:

Function	Sample Requirements	Recommended Datasets
Pitch Detection	Male (80-180 Hz), Female (160-300 Hz), varied intonation	Saarbrücken Voice Database, PTDB-TUG
Formant Extraction	Sustained vowels /a/, /i/, /u/, /e/, /o/ from multiple speakers	Hillenbrand Vowel Database, VTR-TIMIT
HNR	Breathy, modal, and clear voice qualities	Saarbrücken Voice Database
H1-H2	Light to full voice qualities, different phonation types	UCLA Voice Quality Database, VoiceSauce reference recordings
Spectral	Dark to bright voice qualities	Voice quality databases with perceptual labels

Development

# Build
cargo build --release

# Test
cargo test

# Benchmark
cargo bench

# Documentation
cargo doc --open

License

MIT