loqa-voice-dsp
Shared DSP library for voice analysis, providing core digital signal processing functionality for both Loqa backend and Voiceline mobile app.
Features
- Pitch Detection: YIN algorithm for fundamental frequency (F0) estimation
- Formant Extraction: Linear Predictive Coding (LPC) for formant analysis
- FFT Utilities: Fast Fourier Transform for spectral analysis
- Spectral Analysis: Spectral centroid, tilt, and rolloff calculations
- HNR (Harmonics-to-Noise Ratio): Breathiness measurement using Boersma's autocorrelation method
- H1-H2 Amplitude Difference: Vocal weight analysis (lighter vs fuller voice quality)
Installation
iOS (CocoaPods)
Add to your Podfile:
pod ,
Then run:
iOS (Swift Package Manager)
In Xcode:
- File → Add Packages
- Enter repository URL:
https://github.com/loqalabs/loqa - Select version:
0.1.0or later
Or add to Package.swift:
dependencies: [
.package(url: "https://github.com/loqalabs/loqa", from: "0.1.0")
]
Rust (Cargo)
Add to your Cargo.toml:
[]
= "0.1.0"
Usage
Rust (Loqa Backend)
use ;
let audio_samples: = /* your audio data */;
let sample_rate = 16000;
// Pitch detection
let pitch = detect_pitch?;
println!;
// Formant extraction
let formants = extract_formants?;
println!;
// HNR (breathiness)
let hnr = calculate_hnr?;
println!;
// H1-H2 (vocal weight)
let h1h2 = calculate_h1h2?;
println!;
// FFT
let fft_result = compute_fft?;
iOS (Swift via FFI)
// Call C-compatible FFI functions
let samples: [Float] = /* your audio data */
// Pitch detection
let pitchResult = samples.withUnsafeBufferPointer { buffer in
loqa_detect_pitch(
buffer.baseAddress!,
buffer.count,
16000, // sample rate
80.0, // min freq
400.0 // max freq
)
}
if pitchResult.success {
print("Pitch: \(pitchResult.frequency)Hz, Confidence: \(pitchResult.confidence)")
}
// HNR (breathiness)
let hnrResult = samples.withUnsafeBufferPointer { buffer in
loqa_calculate_hnr(
buffer.baseAddress!,
buffer.count,
16000, // sample rate
75.0, // min freq
500.0 // max freq
)
}
if hnrResult.success {
print("HNR: \(hnrResult.hnr) dB, Voiced: \(hnrResult.is_voiced)")
}
// H1-H2 (vocal weight) - pass 0.0 for f0 to auto-detect
let h1h2Result = samples.withUnsafeBufferPointer { buffer in
loqa_calculate_h1h2(
buffer.baseAddress!,
buffer.count,
16000, // sample rate
pitchResult.frequency // use detected pitch, or 0.0 to auto-detect
)
}
if h1h2Result.success {
print("H1-H2: \(h1h2Result.h1h2) dB")
}
Android (Java via JNI)
// Build with --features android-jni
;
float[] audioSamples ;
VoicelineDSP.PitchResult pitch ;
System.out.;
Note: Android JNI requires building with --features android-jni
Implementation Status
- Crate structure created
- Pitch detection (YIN + autocorrelation)
- Formant extraction (LPC-based)
- FFT utilities
- Spectral analysis (centroid, tilt, rolloff)
- HNR calculation (Boersma's autocorrelation method)
- H1-H2 amplitude difference (vocal weight)
- iOS FFI layer (C exports for all functions)
- Android JNI layer (with jni feature)
- Unit tests (68 passing)
- FFI integration tests (9 passing)
- Voice sample validation tests (30 passing)
- Documentation tests (8 passing)
- Benchmarks harness
- Performance benchmarks (validated)
Performance Benchmarks
Validated Performance (2025-11-07) - All targets exceeded ✅
| Operation | Target | Actual (mean) | Result | Speedup |
|---|---|---|---|---|
| Pitch detection (100ms audio) | <20ms | 0.125ms | ✅ PASS | 160x faster |
| Formant extraction (500ms audio) | <50ms | 0.134ms | ✅ PASS | 373x faster |
| FFT (2048 points) | <10ms | ~0.020ms | ✅ PASS | 500x faster |
| Spectral analysis | <5ms | ~0.003ms | ✅ PASS | 1667x faster |
| HNR calculation (100ms window) | <30ms | <1ms | ✅ PASS | >30x faster |
| H1-H2 with F0 provided | <20ms | <1ms | ✅ PASS | >20x faster |
Note: Benchmarks run on Apple M-series silicon. All latency targets easily met with significant performance headroom for real-time voice processing.
Acoustic Measures Reference
HNR (Harmonics-to-Noise Ratio)
Measures the ratio of harmonic (periodic) to noise (aperiodic) energy in voice - the primary acoustic indicator of breathiness.
| HNR Range | Interpretation |
|---|---|
| 18-25+ dB | Clear, less breathy voice |
| 12-18 dB | Moderate breathiness |
| <10 dB | Very breathy or pathological voice |
H1-H2 (First/Second Harmonic Difference)
Measures the amplitude difference between the fundamental and second harmonic - indicates vocal weight.
| H1-H2 Range | Interpretation |
|---|---|
| >5 dB | Lighter, breathier vocal quality |
| 0-5 dB | Balanced vocal weight |
| <0 dB | Fuller, heavier vocal quality |
Test Data
Saarbrücken Voice Database
This library uses samples from the Saarbrücken Voice Database for consistency validation testing.
License: CC BY 4.0
Attribution: Pützer, M. & Barry, W.J., Former Institute of Phonetics, Saarland University. Available at Zenodo.
The SVD provides lab-quality voice recordings including:
- Sustained vowels (/a:/, /i:/, /u:/) at low, normal, and high pitch
- 851 healthy control speakers
- 1002 speakers with documented voice pathologies
- 50 kHz sample rate, controlled recording conditions
Setting Up Test Data
# 1. Download SVD from Zenodo (CC BY 4.0 license)
# https://zenodo.org/records/16874898
# 2. Install conversion dependencies
# 3. Convert SVD files to test format
Test Sample Requirements
For comprehensive validation, the library needs test samples with these characteristics:
| Function | Sample Requirements | Recommended Datasets |
|---|---|---|
| Pitch Detection | Male (80-180 Hz), Female (160-300 Hz), varied intonation | Saarbrücken Voice Database, PTDB-TUG |
| Formant Extraction | Sustained vowels /a/, /i/, /u/, /e/, /o/ from multiple speakers | Hillenbrand Vowel Database, VTR-TIMIT |
| HNR | Breathy, modal, and clear voice qualities | Saarbrücken Voice Database |
| H1-H2 | Light to full voice qualities, different phonation types | UCLA Voice Quality Database, VoiceSauce reference recordings |
| Spectral | Dark to bright voice qualities | Voice quality databases with perceptual labels |
Development
# Build
# Test
# Benchmark
# Documentation
License
MIT