cathar 0.5.1

Audio toolkit in pure Rust — denoise, de-hum, de-click, de-clip, de-reverb, normalise, and more.
Documentation

Cathar

cathar — restore, enhance & level any recording, in pure Rust

Name: cathar is from Greek katharós (καθαρός), "pure, clean" — the same root as catharsis (κάθαρσις), a cleansing. That's the whole job: take a noisy recording and give back clean audio.

CI License MSRV Edition Pure Rust Stars Follow @vbasky

Cathar is a transparent, dependency-free audio restoration toolkit — in pure Rust. It works on a standalone audio file (WAV, MP3, FLAC, OGG, M4A) just as readily as the audio track inside a video (MP4, MKV); video is never required. Every stage is inspectable, tunable DSP — no opaque neural models, no black boxes, so a result you don't like is a knob you can turn rather than a model you have to re-roll. Cathar does three things and writes WAV, FLAC, or AIFF (chosen by the output extension):

  • Restore — denoise (phase-coherent stereo), de-hum, de-wind, de-click, de-clip, de-reverb, spectral repair, de-plosive, de-rustle.
  • Enhance — de-ess, breath removal, voice isolation, bandwidth extension.
  • Level — loudness (LUFS) and peak normalisation for delivery.

No ffmpeg, no C/C++, no system libraries. Decoding is symphonia, the FFT is realfft/rustfft, WAV writing is hound and FLAC is flacenc — all pure Rust, so a single cargo build gives you a self-contained binary. Every effect is also a plain function over &[f32], so the same pipeline drops straight into a Rust program or a larger media-processing pipeline.

Quick start

cargo install cathar-cli                     # from crates.io — installs the `cathar` binary

Or from a checkout:

cargo install --path crates/cathar-cli       # install the binary from source
just setup        # one-time: enable the auto-format pre-commit hook
just build        # build the workspace
just test         # run all tests
# A noisy interview straight off a camera → clean dialogue:
cathar denoise interview.mp4 --out clean.wav

# Learn the room tone from a silent segment, then denoise with it:
cathar noiseprint room_tone.wav --out room.np.json
cathar denoise interview.mp4 --noiseprint room.np.json --out clean.wav

# A restoration chain, one stage at a time:
cathar dehum     recording.wav --freq 60        # kill 60 Hz mains buzz
cathar declick   recording.wav                  # interpolate impulse clicks
cathar declip    recording.wav                  # rebuild clipped peaks
cathar normalize recording.wav --target -16     # to -16 LUFS (podcast)

# Generate a synthetic noisy tone to experiment with:
cathar wave --out test.wav --duration 3 --freq 440 --noise 0.15

The toolkit

Every command reads any supported format and writes WAV (32-bit float), FLAC (24-bit lossless), or AIFF (24-bit) — the container follows the --out extension (.wav / .flac / .aif/.aiff, defaulting to WAV). They are grouped here by what they fix; run them in any order, or chain them.

Reduce — pull noise out of the signal

Command What it does Key flags
denoise Broadband denoiser — spectral subtraction (default) or Wiener filter; --coherent keeps the stereo image stable --alpha 3.0, --beta 0.01, --noiseprint <f>, --wiener, --coherent
noiseprint Learn a noise profile from a silence/room-tone clip → JSON --out noise.np.json
dehum Notch out mains hum (50/60 Hz) and its harmonics --freq 60, --harmonics 5
dewind Cut low-frequency wind rumble with a 4th-order high-pass --cutoff 80
dereverb Suppress room reverb by gating the spectral decay tail --strength 2.0
voiceisolate Keep speech, gate everything else (energy VAD + spectral gate) --noiseprint <f>
deesser Tame harsh sibilance ("sss"); --bands >1 is multiband + adaptive --freq 4000, --threshold -24, --bands 1
deplosive Tame plosive "p"/"b" pops (low-frequency transient bursts) --strength 4
derustle Suppress lavalier / clothing rustle (mid-band transient bursts) --strength 4
breath Detect and high-pass the breaths before speech onsets

Repair — reconstruct damaged samples

Command What it does Key flags
declick Detect impulse clicks against the local RMS and interpolate across them --threshold 10.0
declip Find flat-topped clipped runs and rebuild the missing peaks --threshold 0.95
repair Paint out isolated transient spectral artifacts (whistles, bursts, glitches) --strength 4.0

Enhance & level

Command What it does Key flags
enhance Bandwidth extension — resample up and synthesise the missing highs --rate 48000
normalize Loudness (LUFS, true EBU R128) or peak (dBFS) normalisation --target -16, --peak, --true-peak -1

Utility

Command What it does Key flags
resample Resample to a different rate (anti-aliased, any ratio) --rate 48000
wave Generate a synthetic sine + noise test tone --freq 440, --duration 3, --noise 0.1, --sample-rate 44100
batch Denoise (and optionally de-hum / normalise) a whole directory --indir, --outdir, --dehum <hz>, --normalize <lufs>, --exts

--target for normalize is roughly: -23 broadcast (EBU R128), -16 podcast, -14 streaming.

How denoising works

Cathar decodes to interleaved f32 PCM, then most reduction stages run as an STFT (short-time Fourier transform) → modify the spectrum → inverse STFT loop. The denoiser uses a 2048-point FFT with a 512-sample hop (75 % overlap) and a Hann window on both analysis and synthesis, reconstructed by overlap-add:

cathar STFT denoise pipeline: input.mp4 → symphonia decode → f32 PCM → STFT (Hann, 2048-pt FFT, 512 hop) → magnitude + phase → spectral subtraction (phase preserved) → recombine → inverse FFT / overlap-add → clean.wav

Two denoiser flavours share that frame loop:

  • Spectral subtraction (default) — estimate the noise magnitude per bin and subtract α × it, held above a spectral floor β·mag so you trade artifacts ("musical noise") against aggressiveness. α from 1→6 goes gentle→aggressive.
  • Wiener filter (--wiener) — apply the statistically optimal per-bin gain gain = S / (S + N) from the estimated signal and noise power; smoother on stationary noise.

The noise spectrum comes either from minimum-statistics (the quietest ~15 % of frames are taken as noise) or, for a cleaner result, from a noiseprint learned off a dedicated silent segment.

Inside each tool

Every stage is classic, inspectable DSP — no black boxes.

Tool Technique
denoise STFT 2048/512, Hann; spectral subtraction max(mag−α·N, β·mag) or Wiener S/(S+N). --coherent derives one gain mask from the mid (L+R) signal and applies it to every channel, so the stereo image stays put
dewind 4th-order Butterworth high-pass (two cascaded biquads, ~24 dB/oct) at --cutoff
noiseprint Per-bin magnitude spectrum of a noise clip, serialised to JSON
dehum Cascade of 2nd-order IIR notch biquads (Q = 30) at the base frequency and each harmonic up to Nyquist
declick Sliding-window local RMS; samples exceeding threshold × RMS are clicks, replaced by cubic-Hermite interpolation
declip Detect runs at/above threshold (shoulders extended ±4 samples), rebuild with cubic-Hermite interpolation
repair STFT 2048/512; per bin, compare magnitude to its temporal median (±4 frames) and pull transient outliers back to the median, phase preserved — sustained content is untouched, overlap-add is window-normalised to unity
dereverb Two-pass spectral-decay gating: track each bin's envelope (8 ms attack / 50 ms release), gate bins sitting near their reverb floor
voiceisolate Energy VAD on 20 ms frames (gap-fill < 120 ms, drop segments < 50 ms) + spectral gating of non-speech (tighter with a noiseprint)
deesser STFT 2048/256; single-band compresses the HF region when its power ratio exceeds the threshold. --bands >1 splits the sibilant region into sub-bands, each compressed when it rises threshold dB above its own EMA-tracked running level (multiband + adaptive)
deplosive / derustle STFT; per frame measure energy in a band (plosive < 250 Hz, rustle 1.5–6 kHz); frames whose band energy spikes above the temporal median are scaled back toward it, phase preserved, sustained content untouched
breath VAD-flag the frames just before a speech onset (≤ 150 ms) and high-pass them at 200 Hz, mixed 40 / 60 dry/wet
resample Kaiser-windowed sinc (16 lobes, β = 9), arbitrary ratio; cutoff tracks the lower Nyquist so downsampling is anti-aliased and upsampling adds no imaging
enhance Shared resampler to the target rate, then spectral band replication (4096 FFT) folds the existing top band into the empty highs with a tiled rolloff
normalize Peak: scale so the loudest sample hits the dBFS target. Loudness: ITU-R BS.1770-4 / EBU R128 integrated LUFS (K-weighting, gated) measured jointly across channels, applied as one broadband gain and held back to the --true-peak dBTP ceiling (4× oversampled) so it never clips

Library usage

The cathar crate is the same engine the CLI drives.

use cathar::{AudioData, Denoiser, SpectralDenoiser, dehum};

let audio = AudioData::from_file("interview.mp4")?;   // symphonia decode → f32
let sr = audio.sample_rate;

// Denoise and de-hum per channel via `map_channels`, then normalise to
// -16 LUFS (EBU R128) with a -1 dBTP true-peak ceiling. Loudness is measured
// across all channels jointly, so normalisation is a whole-signal method.
let clean = SpectralDenoiser::default()
    .denoise(&audio)?
    .map_channels(|ch| dehum(ch, sr, 60.0, 5))
    .normalize_r128(-16.0, -1.0);

clean.to_file("clean.wav")?;   // 32-bit float WAV via hound

Learn a noise print once and reuse it for a tighter subtraction:

use cathar::{AudioData, Denoiser, SpectralDenoiser, learn_noise_print};

let print = learn_noise_print(&AudioData::from_file("room_tone.wav")?)?;

let audio = AudioData::from_file("interview.mp4")?;
let clean = SpectralDenoiser::with_noise_print(print, /* alpha */ 3.0, /* beta */ 0.01)
    .denoise(&audio)?;
clean.to_file("clean.wav")?;

The public surface is small and direct:

  • AudioData { sample_rate, channels: Vec<Vec<f32>> }from_file, to_file, map_channels(|&[f32]| -> Vec<f32>) for per-channel effects, normalize_r128(target_lufs, true_peak_ceiling_db) for whole-signal loudness, and resample(target_rate) for the main-path resampler.
  • Denoiser trait + SpectralDenoiser (configurable fft_size, hop_size, alpha, beta, noise_frame_ratio, optional noise_print); denoise and denoise_coherent (phase-coherent stereo).
  • NoisePrint + learn_noise_print + wiener_denoise.
  • Free functions: dehum, dewind, declick, declip, spectral_repair, deplosive, derustle, dereverb, voice_isolate, deesser, deess_multiband, breath_remove, bandwidth_extend, resample, normalize_peak, integrated_loudness, true_peak_dbtp, generate_wave.

Formats & I/O

Stage Detail
Reads MP4, M4A, MKV, MP3, FLAC, WAV, OGG — any container/codec symphonia decodes (built with features = ["all"])
Decodes to 32-bit float PCM, one Vec<f32> per channel, at the file's native sample rate
Writes 32-bit float WAV via hound — no inter-stage quantisation
Resampling Only on the enhance path (windowed sinc); every other stage runs at the source rate
Channels Preserved; effects run independently per channel

Architecture

A deliberately small two-crate workspace — a library and the binary that drives it.

cathar/
├─ crates/
│  ├─ cathar/        # the engine: decode (symphonia) · DSP · encode (hound)
│  └─ cathar-cli/    # the `cathar` binary — clap subcommands over the engine
└─ docs/             # banner + assets
Dependency Role
symphonia (all) Decode every supported container/codec to f32 PCM
realfft / rustfft Forward/inverse real FFT behind every STFT stage
hound Write 32-bit float WAV
clap (derive) CLI parsing
serde / serde_json NoisePrint serialisation (*.np.json)
thiserror / anyhow Library error type / CLI error reporting
candle-core, candle-nn (optional ml feature) scaffolding for a future learned denoiser

Design

Principle What it means
Pure Rust No ffmpeg, no C/C++ FFI, no pkg-config — one cargo build produces a self-contained binary
Lossless float pipeline Decode → f32 → process → 32-bit float WAV; nothing is quantised between stages
Composable Every effect is a plain fn(&[f32], …) -> Vec<f32>; chain them in any order, in the CLI or as a library
Inspectable DSP Classic, documented algorithms (STFT subtraction, Wiener, IIR notches, cubic interpolation) — not opaque models
Deterministic Single-threaded and frame-synchronous: the same input always yields the same output

Pipeline integration

Because the whole toolbox is a library of &[f32] functions plus a single static binary with no system dependencies, cathar slots cleanly into a larger media pipeline: call it in-process through the cathar crate, or shell out to cathar <stage> … between other steps. Inputs are read straight from the container files, so it can sit immediately after ingest and before encoding.

Roadmap

Cathar is 0.4.x, restoration-first, and growing — before 1.0 — into a general-purpose, pure-Rust audio swiss-army knife (a SoX-class tool with no ffmpeg and no C/C++ FFI). See ROADMAP.md for the full plan and SoX-parity checklist. The 0.20.4 foundations are complete:

  • True EBU R128 loudness (normalize) — K-weighted gated LUFS with a --true-peak dBTP ceiling.
  • Main-path resampling — the resample command + AudioData::resample, a shared anti-aliased Kaiser-windowed sinc any stage can call.
  • Encode beyond WAV — 24-bit lossless FLAC and 24-bit AIFF on the pure-Rust default path, selected by the output extension.

Next up is restoration depth (Phase 1 0.5) and the swiss-army expansion (Phase 2) — see ROADMAP.md.

The optional ml feature wires in candle for a learned denoiser (0.6); the neural model itself is not implemented yet.

Development

just check-all runs fmt-check, clippy (-D warnings), tests, and docs — the same gate CI enforces on Linux and macOS.

Task Command
Build just build / just build-release
Format just fmt (just fmt-check to verify)
Lint just lint
Test just test
Docs just docs
Audit just deny (needs cargo install cargo-deny)
Run just run -- <args>

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.