audioscan 0.1.0

Fast CLI audio analysis for format metadata, EBU R128 loudness, and silence windows
Documentation

audioscan

Decode an audio file once and report its format, EBU R128 loudness, and silence windows as JSON. One fast native pass instead of two or three ffmpeg shellouts.

Why

I mix and master music, and a private catalog of mine needs two boring things from every recording: how loud it is (so tracks sit at a consistent volume) and where the silent gaps are (so it can split a long recording into songs). The first version got both by running ffmpeg, the standard audio command-line tool, and reading the numbers out of its status text. That fully decodes the file once per measurement and is fragile: it already cost a real bug, reading ffmpeg's first per-frame I: -70 line instead of the final Summary block, storing a loudness off by tens of decibels. audioscan decodes each file a single time with symphonia, measures loudness with the real ebur128 library (the same math ffmpeg uses, on the EBU R128 standard that streaming services use to keep volume consistent), finds silence in the same pass, and prints structured JSON. Same numbers, fewer decodes, nothing to scrape.

Install

cargo install audioscan

That installs the audioscan binary from crates.io. To install the latest from source without cloning, use cargo install --git https://github.com/KiwiMaddog2020/audioscan. Prebuilt macOS and Linux binaries are attached to each release.

Build

cargo build --release      # binary at target/release/audioscan

Use

audioscan [--compact|--pretty] [--strict] [--timeout <s>] [--threshold <RMS-dBFS>] [--min-gap <s>] <file>
  • --pretty pretty-printed JSON (default)
  • --compact one-line JSON
  • --strict fail instead of returning status: "partial" when decode is incomplete
  • --timeout per-file soft decode deadline in seconds (default: none / unbounded)
  • --threshold silence threshold in RMS dBFS (default -30)
  • --min-gap shortest silence to report, in seconds (default 5.0)

--timeout <secs> bounds how long a single file may spend decoding. It is a cooperative soft deadline checked between packets, so a slow or wedged file stops at the limit instead of running unbounded. A timed-out file is reported as status: "partial" with a decode exceeded timeout of <N>s warning, or, under --strict, an error. The default is no timeout, so legitimately long recordings are never truncated unless you set one. In batch mode the deadline applies per file and the batch continues past a timed-out file.

On success, single-file mode prints audioscan: analyzed <path> in <N.NN>s to stderr, so the JSON on stdout stays clean and pipeable.

Batch

audioscan batch <dir> [--out <file.jsonl>] [--jobs auto|<N>] [--strict] [--timeout <s>] [--threshold <RMS-dBFS>] [--min-gap <s>]

Batch mode recursively scans known audio extensions under <dir> and emits compact JSON Lines, one row per file. Without --out, rows are written to stdout. --jobs auto uses rayon's default worker count; --jobs <N> pins the batch to a fixed positive worker count.

Each batch JSONL row, success or error, also includes "bytes": <input file size in bytes on disk>, a deterministic per-row field for sorting or spotting large inputs. Successful rows contain the analysis object shown below plus bytes. Per-file failures are written as {"schema_version":1,"path":"...","error":"...","bytes":1234}. bytes is a batch-row-only operational field; the single-file output object below does not include it. Each file is isolated with panic capture, so a panic or decode failure for one recording becomes an error row instead of aborting the batch.

Batch mode prints a live per-file progress line to stderr as each file completes, followed by the summary and slowest-file timing report:

audioscan: [3/2000] /archive/take_03.wav (1182ms)
audioscan: scanned 2000 file(s): 1996 ok, 3 partial, 1 failed in 41.7s
audioscan: slowest: big.flac 3201ms (118.0 MB), long.wav 1980ms (90.2 MB), take_03.wav 1182ms (44.1 MB)

Because the breadcrumb streams as files finish, not just at the end, a wedged or slow file is visible live as the file with no completion line yet, and the run is not silent until the end. The slowest: line lists the slowest files with each file's elapsed time in milliseconds and size. stdout JSON Lines stay byte-identical across --jobs counts, so per-file wall-clock timing and progress live on stderr instead of in the JSONL stream.

Exit codes are 0 when the command completes and writes its requested output, 1 for fatal runtime failures such as unreadable output paths, no discovered audio files, or a failed single-file scan, and 2 for usage or invalid-config errors. Batch per-file error rows do not by themselves make the batch command fail once the JSONL output has been written.

Output

{
  "schema_version": 1,
  "path": "take.wav",
  "container": "wav",
  "codec": "pcm_s16le",
  "sample_rate": 48000,
  "channels": 2,
  "bits_per_sample": 16,
  "duration_sec": 212.5,
  "integrated_lufs": -14.2,
  "loudness_range_lu": 8.6,
  "true_peak_dbtp": -1.1,
  "silence_threshold_db": -30.0,
  "silence_min_gap_sec": 5.0,
  "silences": [[6.0, 12.0]],
  "status": "ok",
  "skipped_packets": 0,
  "warnings": []
}

status is ok for a clean decode and partial when the scan completed after skipping corrupt packets, detecting an incomplete stream, or exceeding a configured timeout. warnings[] holds human-readable diagnostics for partial output; it is empty for clean output. With --strict, partial decodes become errors instead of JSON analysis rows. container is the lowercased file extension from the input path, or "" for an extensionless path.

integrated_lufs and loudness_range_lu are null together when the input is too short or quiet to measure. true_peak_dbtp is null only for digital silence, where there is no inter-sample peak to report. silences uses a simple [start, end] seconds convention. Silence boundaries are quantized to the roughly 30 ms analysis window, matching ffmpeg silencedetect.

Validation

Checked against ffmpeg's ebur128 filter on generated signals:

signal metric audioscan ffmpeg
1 kHz @ -3 dBFS + 6 s silence integrated -6.26 LUFS -6.3 LUFS
true peak -3.0 dBTP (-3 dBFS sine)
silence [6.0, 12.0] (built at 6-12 s)
varied -6/-18/-3/-14/-9 dBFS integrated -9.46 LUFS -9.5 LUFS
loudness range 11.0 LU 11.0 LU

Reproduce:

python3 tools/make_signal.py samples/signal.wav
cargo run -- samples/signal.wav
ffmpeg -hide_banner -nostats -i samples/signal.wav -af ebur128 -f null -

Note: LRA only agrees on signals with real loudness variation. On a degenerate two-level signal the percentile gating is unstable in both tools and they disagree, which is expected, not a bug.

Formats

Enabled: wav, flac, mp3, aac/m4a, ogg/vorbis, adpcm, mkv (symphonia defaults plus mp3, aac, isomp4). Not yet enabled: aiff, alac, opus. Add the feature in Cargo.toml when a recording needs it.

Status and next steps

Standalone by design, intentionally not yet wired into the catalog it was built for. Swapping a production pipeline's measurement path for an audioscan subprocess is a separate, careful change. Because the contract is "run a binary, read JSON," that swap stays clean when I make it.

Candidate directions:

  • a C interface so a Swift app can call the same core directly, with no subprocess
  • bump symphonia to 0.6