loqa-voice-dsp 0.2.0

Shared DSP library for voice analysis (pitch, formants, spectral features)
Documentation

loqa-voice-dsp

Shared DSP library for voice analysis, providing core digital signal processing functionality for both Loqa backend and Voiceline mobile app.

Features

  • Pitch Detection: YIN algorithm for fundamental frequency (F0) estimation
  • Formant Extraction: Linear Predictive Coding (LPC) for formant analysis
  • FFT Utilities: Fast Fourier Transform for spectral analysis
  • Spectral Analysis: Spectral centroid, tilt, and rolloff calculations
  • HNR (Harmonics-to-Noise Ratio): Breathiness measurement using Boersma's autocorrelation method
  • H1-H2 Amplitude Difference: Vocal weight analysis (lighter vs fuller voice quality)

Installation

iOS (CocoaPods)

Add to your Podfile:

pod 'LoqaVoiceDSP', '~> 0.1.0'

Then run:

pod install

iOS (Swift Package Manager)

In Xcode:

  1. File → Add Packages
  2. Enter repository URL: https://github.com/loqalabs/loqa
  3. Select version: 0.1.0 or later

Or add to Package.swift:

dependencies: [
    .package(url: "https://github.com/loqalabs/loqa", from: "0.1.0")
]

Rust (Cargo)

Add to your Cargo.toml:

[dependencies]
loqa-voice-dsp = "0.1.0"

Usage

Rust (Loqa Backend)

use loqa_voice_dsp::{detect_pitch, extract_formants, compute_fft, calculate_hnr, calculate_h1h2};

let audio_samples: Vec<f32> = /* your audio data */;
let sample_rate = 16000;

// Pitch detection
let pitch = detect_pitch(&audio_samples, sample_rate, 80.0, 400.0)?;
println!("Frequency: {} Hz, Confidence: {}", pitch.frequency, pitch.confidence);

// Formant extraction
let formants = extract_formants(&audio_samples, sample_rate, 14)?;
println!("F1: {} Hz, F2: {} Hz", formants.f1, formants.f2);

// HNR (breathiness)
let hnr = calculate_hnr(&audio_samples, sample_rate, 75.0, 500.0)?;
println!("HNR: {} dB, Voiced: {}", hnr.hnr, hnr.is_voiced);

// H1-H2 (vocal weight)
let h1h2 = calculate_h1h2(&audio_samples, sample_rate, Some(pitch.frequency))?;
println!("H1-H2: {} dB", h1h2.h1h2);

// FFT
let fft_result = compute_fft(&audio_samples, sample_rate, 2048)?;

iOS (Swift via FFI)

// Call C-compatible FFI functions
let samples: [Float] = /* your audio data */

// Pitch detection
let pitchResult = samples.withUnsafeBufferPointer { buffer in
    loqa_detect_pitch(
        buffer.baseAddress!,
        buffer.count,
        16000,  // sample rate
        80.0,   // min freq
        400.0   // max freq
    )
}
if pitchResult.success {
    print("Pitch: \(pitchResult.frequency)Hz, Confidence: \(pitchResult.confidence)")
}

// HNR (breathiness)
let hnrResult = samples.withUnsafeBufferPointer { buffer in
    loqa_calculate_hnr(
        buffer.baseAddress!,
        buffer.count,
        16000,  // sample rate
        75.0,   // min freq
        500.0   // max freq
    )
}
if hnrResult.success {
    print("HNR: \(hnrResult.hnr) dB, Voiced: \(hnrResult.is_voiced)")
}

// H1-H2 (vocal weight) - pass 0.0 for f0 to auto-detect
let h1h2Result = samples.withUnsafeBufferPointer { buffer in
    loqa_calculate_h1h2(
        buffer.baseAddress!,
        buffer.count,
        16000,  // sample rate
        pitchResult.frequency  // use detected pitch, or 0.0 to auto-detect
    )
}
if h1h2Result.success {
    print("H1-H2: \(h1h2Result.h1h2) dB")
}

Android (Java via JNI)

// Build with --features android-jni
import com.voiceline.VoicelineDSP;

float[] audioSamples = /* your audio data */;
VoicelineDSP.PitchResult pitch = VoicelineDSP.detectPitch(
    audioSamples,
    16000,  // sample rate
    80.0f,  // min freq
    400.0f  // max freq
);

System.out.println("Frequency: " + pitch.frequency + " Hz");

Note: Android JNI requires building with --features android-jni

Implementation Status

  • Crate structure created
  • Pitch detection (YIN + autocorrelation)
  • Formant extraction (LPC-based)
  • FFT utilities
  • Spectral analysis (centroid, tilt, rolloff)
  • HNR calculation (Boersma's autocorrelation method)
  • H1-H2 amplitude difference (vocal weight)
  • iOS FFI layer (C exports for all functions)
  • Android JNI layer (with jni feature)
  • Unit tests (68 passing)
  • FFI integration tests (9 passing)
  • Voice sample validation tests (30 passing)
  • Documentation tests (8 passing)
  • Benchmarks harness
  • Performance benchmarks (validated)

Performance Benchmarks

Validated Performance (2025-11-07) - All targets exceeded ✅

Operation Target Actual (mean) Result Speedup
Pitch detection (100ms audio) <20ms 0.125ms ✅ PASS 160x faster
Formant extraction (500ms audio) <50ms 0.134ms ✅ PASS 373x faster
FFT (2048 points) <10ms ~0.020ms ✅ PASS 500x faster
Spectral analysis <5ms ~0.003ms ✅ PASS 1667x faster
HNR calculation (100ms window) <30ms <1ms ✅ PASS >30x faster
H1-H2 with F0 provided <20ms <1ms ✅ PASS >20x faster

Note: Benchmarks run on Apple M-series silicon. All latency targets easily met with significant performance headroom for real-time voice processing.

Acoustic Measures Reference

HNR (Harmonics-to-Noise Ratio)

Measures the ratio of harmonic (periodic) to noise (aperiodic) energy in voice - the primary acoustic indicator of breathiness.

HNR Range Interpretation
18-25+ dB Clear, less breathy voice
12-18 dB Moderate breathiness
<10 dB Very breathy or pathological voice

H1-H2 (First/Second Harmonic Difference)

Measures the amplitude difference between the fundamental and second harmonic - indicates vocal weight.

H1-H2 Range Interpretation
>5 dB Lighter, breathier vocal quality
0-5 dB Balanced vocal weight
<0 dB Fuller, heavier vocal quality

Test Data

Saarbrücken Voice Database

This library uses samples from the Saarbrücken Voice Database for consistency validation testing.

License: CC BY 4.0

Attribution: Pützer, M. & Barry, W.J., Former Institute of Phonetics, Saarland University. Available at Zenodo.

The SVD provides lab-quality voice recordings including:

  • Sustained vowels (/a:/, /i:/, /u:/) at low, normal, and high pitch
  • 851 healthy control speakers
  • 1002 speakers with documented voice pathologies
  • 50 kHz sample rate, controlled recording conditions

Setting Up Test Data

# 1. Download SVD from Zenodo (CC BY 4.0 license)
#    https://zenodo.org/records/16874898

# 2. Install conversion dependencies
pip install scipy numpy

# 3. Convert SVD files to test format
python scripts/download_svd.py /path/to/extracted/svd

Test Sample Requirements

For comprehensive validation, the library needs test samples with these characteristics:

Function Sample Requirements Recommended Datasets
Pitch Detection Male (80-180 Hz), Female (160-300 Hz), varied intonation Saarbrücken Voice Database, PTDB-TUG
Formant Extraction Sustained vowels /a/, /i/, /u/, /e/, /o/ from multiple speakers Hillenbrand Vowel Database, VTR-TIMIT
HNR Breathy, modal, and clear voice qualities Saarbrücken Voice Database
H1-H2 Light to full voice qualities, different phonation types UCLA Voice Quality Database, VoiceSauce reference recordings
Spectral Dark to bright voice qualities Voice quality databases with perceptual labels

Development

# Build
cargo build --release

# Test
cargo test

# Benchmark
cargo bench

# Documentation
cargo doc --open

License

MIT