basinski 0.1.0

Rescues media files from their own disintegration. Named for William Basinski's Disintegration Loops.
# basinski

**Rescues media files from their own disintegration.**

Named for William Basinski's *The Disintegration Loops* — music made from tape
loops that crumbled as they played back. This tool is for when your files do
the same thing, except you'd rather have them back.

```
$ basinski rescue head_truncated.mp4
◌ head_truncated.mp4 — 686556 bytes
  forensics:  80%  ISO BMFF / MP4 (damaged front)  [self-consistent atom chain: moov@678775 ...]

  diagnosis: head-truncated MP4
    bytes cut from front : 4096  (determined by stsz/NAL-chain correlation)
    media data destroyed : 4048 bytes
    keyframes destroyed  : 1 (first clean keyframe at 1.000s)

  ☼ regrew 4096-byte prefix (ftyp + free silence)
  container restored: mov,mp4, 6.000000s, video:h264 640x360, audio:aac
  ✂ clipping to first clean keyframe at 1.000s

  ✔ rescued → head_truncated.rescued.mp4
    full decode: clean (0 errors)
```

## What it does

- **`identify`** — forensic identification of audio/video, *headers or no
  headers*. Instead of magic bytes at offset 0, it hunts for codec-level
  structure anywhere in the file: MP3 frame-sync chains, ADTS headers,
  MPEG-TS packet cadence, H.264 NAL units (both Annex B start codes and
  MP4-style length prefixes), MP4 atom skeletons, EBML cluster IDs, Ogg pages.

- **`rescue`** — puts head-truncated MP4s back together. Most muxers write `moov`
  (the index, with every sample's *absolute byte offset*) at the **end** of
  the file, so cutting the head off kills `ftyp` and wounds `mdat`, but the
  index survives. basinski finds the surviving index, works out exactly how
  many bytes `K` died, and regrows a prosthetic prefix — a synthetic `ftyp`
  plus a `free` atom of structural silence — so every offset in the index is
  true again. Two ways it finds `K`:
  1. **mdat header anchor** — the surviving `mdat` header says where it sits
     now; the sample table says where it used to sit.
  2. **stsz/NAL-chain correlation** — when even the mdat header is gone, slide
     the sample-size table along the surviving bytes until the H.264
     length-prefixed NAL structure lines up. The index fits the data like a
     dental record.

  Media that was genuinely destroyed is zero-filled and then clipped away at
  the first clean keyframe (computed from the index, verified against the
  real packet list). Self-synchronizing formats with damaged heads (MP3,
  ADTS AAC, MPEG-TS, raw H.264) are trimmed to their first verifiable sync
  and re-headed.

- **`rescue --reference <intact-sibling>`** — the **moov transplant**, for
  when the index is gone *entirely*: a recording cut off mid-write, or a
  `faststart` file whose front died. This is what `untrunc` does, done by
  hand. An intact file from the same device speaks the same dialect — same
  SPS/PPS, same frame cadence, same muxing habits — so basinski harvests its
  organs (`stsd`, `tkhd`, timing) and regrows the index by walking the
  orphaned payload one NAL unit at a time: AVCC length prefixes chain sample
  to sample, `first_mb_in_slice == 0` marks each new access unit, IDR NALs
  rebuild the keyframe table. B-frame display order is recovered by parsing
  `pic_order_cnt_lsb` out of every slice header (field widths from the
  donor's SPS) and ranking POCs within each GOP — the `ctts` box regrows
  from the slices themselves, which is more than untrunc bothers with.
  Constant-sample-size audio (PCM) is split exactly. Variable-frame **AAC**
  is salvaged from the interleave gaps by recognizing the recurring CPE
  element header that opens every frame, wrapping each audio chunk in a
  synthetic ADTS header, and letting the decoder walk the blocks — no audio
  donor required, since ADTS is self-describing. The recovered audio doubles
  as a clock: its true duration over the video frame count **reveals the real
  frame rate**, which a parameter-less stream otherwise can't tell you.

  Because that stream carries no timing of its own, the donor's frame rate is
  only a guess. If a rescue comes out fast or slow, **`rescue --fps N`**
  re-times it without re-divining (basinski prints the audio-implied rate to
  tell you what N should be). `--no-audio` skips the audio salvage;
  `--audio-rate` overrides the assumed 44100 Hz.

- **`divine`** — when there is no index, no SPS/PPS, *and no intact sibling
  to borrow from*, basinski dowses for the lost codec parameters. The space
  of plausible parameter sets is small and testable: synthesize a candidate
  donor for each guess, decode the stream's own keyframe under it, and ask
  whether what came out looks like a *picture* — heavy-tailed edge
  statistics, quiet macroblock seams, sane chroma, frames that cohere with
  their neighbors. A wrong guess decodes to static; the right one produces
  borders and shapes. Resolution, entropy mode (the CABAC init-QP seed is
  swept outward from 26 — x264 writes its CRF straight into it), SPS field
  widths, reference counts, weighted prediction: all gridded, judged in
  parallel, and the winner is written out as a ready-made `--reference`
  donor. Drop a `mobilenetv2.onnx` in `~/.cache/basinski/` (or pass
  `--model`) and a tiny image classifier re-ranks the survivors — useful
  against the one illusion the classical metrics fall for, a wrong-geometry
  decode whose macroblocks reflow into locally-plausible shear.

- **`clip`** — identifies keyframes and clips artifacted-but-playable video
  to the first cleanly-decoding one. `--list` shows the keyframes; `--from`
  picks your own.

- **`convert --the-correct-format`** — converts any media to The Correct
  Format. The Correct Format is mp4 (H.264 + AAC) for video and mp3 for
  audio. There are no other formats. The flag is the entire format menu.

## A field guide: using the subcommands together

The five subcommands form a ladder of escalating desperation. You climb only
as far as the damage forces you to.

```mermaid
flowchart TD
    A([a damaged file]) --> B[basinski identify]
    B --> C{does ffprobe<br/>open it?}

    C -->|"yes, but it stutters / artifacts"| CLIP[basinski clip<br/><i>cut to the first clean keyframe</i>]
    C -->|"yes, and it is fine"| DONE([keep it])
    C -->|"no — it won't open"| R[basinski rescue]

    R --> D{what does the<br/>diagnosis say?}
    D -->|"head-truncated MP4<br/>(front cut, moov survived)"| SURG[surgical rescue<br/><i>regrow the prefix, no flags</i>]
    D -->|"headless stream<br/>(mp3 / adts / ts / h264)"| RESYNC[trim to first sync<br/>+ re-head]
    D -->|"no index at all<br/>(tail cut / faststart front gone)"| E{got an intact<br/>sibling file?}

    E -->|yes| REF[rescue --reference sibling.mp4<br/><i>moov transplant</i>]
    E -->|no| DIV[basinski divine<br/><i>brute-force a donor from the bytes</i>]
    DIV --> REF2[rescue --reference name.donor.mp4]

    SURG --> CHK{plays right?}
    REF --> CHK
    REF2 --> CHK
    RESYNC --> CHK
    CLIP --> CHK

    CHK -->|"too fast / slow"| FPS[re-run with --fps N<br/><i>audio prints the true rate</i>]
    CHK -->|"want it normalized"| CONV[convert --the-correct-format<br/><i>or rescue --the-correct-format</i>]
    CHK -->|"good"| DONE
    FPS --> DONE
    CONV --> DONE
```

**The walkthrough.** You start every time with `identify` — it tells you what
the bytes *are* even when the extension lies and the header is gone. Then one
of three stories unfolds.

*The lucky case.* The file opens but hiccups — a corrupt run near the front,
clean footage after. `clip` finds the first keyframe that decodes cleanly and
cuts to it. Done.

*The surgical case.* `rescue` looks at an MP4 that ffprobe won't touch and
finds the `moov` index alive at the tail (most muxers write it there). The
front — `ftyp`, the `mdat` header — is what died. basinski works out exactly
how many bytes `K` were lost and regrows a prosthetic prefix so every offset
in the surviving index points true again. No flags, no donor; the rescue is
deterministic and, for a clean head-truncation, bit-identical to the original.

*The hard case.* The index itself is gone — the tail was cut, or the file was
`faststart` and its front (where `moov` then lived) is the casualty. Now there
is nothing saying where samples begin. Two ways forward:

- **You have a sibling** — any intact clip from the same camera or app.
  `rescue --reference sibling.mp4` harvests its codec parameters and walks the
  orphaned `mdat` to rebuild the index (this is what `untrunc` does).
- **You have nothing but the wreck.** `divine` manufactures the donor. It
  brute-forces the lost parameters by synthesizing candidates, decoding the
  file's own keyframe under each, and scoring whether the result *looks like a
  picture*. It writes `<name>.donor.mp4`, which you then hand to
  `rescue --reference`.

**The last mile.** A transplanted stream carries no timing of its own, so it
may come out overcranked. basinski salvages the AAC audio from the gaps
between video frames, and that audio is a clock: it prints the frame rate it
implies. Re-run `rescue --fps N` to set it right — no re-divining. Finally, if
you want the survivor in a tidy, universally-playable container,
`--the-correct-format` (on `rescue` or `convert`) re-encodes it to H.264+AAC
mp4, or mp3 for audio.

A real two-step session, start to finish, on a file with no index and no
sibling:

```sh
basinski identify wreck.bin            # → "H.264 in MP4 framing", no moov
basinski divine   wreck.bin            # → wreck.donor.mp4  (prints "~25 fps")
basinski rescue   wreck.bin --reference wreck.donor.mp4 --fps 25
                                       # → wreck.rescued.mp4, video + sound
```

## Install / build

```sh
cargo build --release   # binary at target/release/basinski
```

Requires `ffmpeg` and `ffprobe` on PATH for rescue/clip/convert/divine (the
forensic `identify` works without them; `divine` additionally wants libx264
in your ffmpeg for donor synthesis). Everything else is plain Rust: `clap`,
`anyhow`, `serde`, and `tract-onnx` for the optional neural second opinion —
the parsing, scanning, bit-splicing, and reconstruction are done by hand, by
candlelight.

## Shell completions

`basinski completions <shell>` prints a completion script to stdout, with every
subcommand, flag, and file argument described. For zsh:

```sh
basinski completions zsh > ~/.zfunc/_basinski   # ensure ~/.zfunc is in your $fpath
# then in ~/.zshrc, before `compinit`:           fpath=(~/.zfunc $fpath)
```

`bash`, `fish`, `powershell`, and `elvish` work the same way — swap the shell
name. The script is generated from the CLI definition, so it never drifts out of
date; re-run it after upgrading.

## Honest limitations

- A head-truncated MP4 without a surviving `moov` needs a `--reference` file from
  the same device — or `divine`, which manufactures one by brute force. The
  divination grid covers progressive H.264 in its common shapes; interlaced
  field coding, multi-PPS streams, and 10-bit profiles are outside it (so
  far), and basinski will tell you so plainly.
- `divine`'s evidence is one keyframe plus a handful of trailing slices;
  parameters that only misbehave deeper into a stream (frame_num wrapping
  past the evidence, say) can slip through.
- Salvaged AAC assumes stereo AAC-LC at 44100 Hz (override with
  `--audio-rate`); a few gaps that don't frame cleanly are skipped, so the
  audio can run a touch short and drift slightly from the video by the end.
- A transplanted stream has no timing of its own, so playback speed depends
  on `--fps`; the recovered audio reports the true rate but you have to pass
  it back in.
- The transplant walks H.264 (avc1/avc3) payloads only; HEVC grammar is not
  implemented yet.
- Transplanted AAC audio is dropped: raw AAC frames in an MP4 have no sync
  words, so splitting them requires a decoder in the loop, which basinski
  refuses to pretend it has. PCM audio recovers exactly.
- A truncation that lands mid-GOP costs the trailing B-frames; the last
  recovered reference frame then displays a beat early. Data that died
  stays dead.
- The whole file is read into memory; fine for phone videos, rude for 100 GB
  masters.
- `K` detection is verified against the data (≥80% of sampled video samples
  must parse as valid NAL chains at the proposed alignment) before any
  reconstruction is attempted.

## Tests

```sh
cargo test
```

Unit tests cover the frame-sync scanners, the full head-truncate→analyze→rebuild
cycle on synthetic MP4s (clean head-truncation, cut through the mdat header, cut
deep into media, moov loss), and the transplant (tail cut, head cut at unknown
phase, exact PCM splitting, honest AAC dropping). The end-to-end script
head-truncates a real mp4 two ways and rescues both, then cuts the last 40% off
another — taking the entire index with it — and rescues that through a
donor file. All three decode with zero errors; the first is bit-identical.

## License

basinski is free software, licensed under the **GNU General Public License,
version 3 or later** (GPL-3.0-or-later). See [`LICENSE`](LICENSE) for the full
text. The copyleft is deliberate: the recovery work leans on GPL codec tooling
(libx264 for donor synthesis), so basinski is GPL too — use it, study it, share
it, and pass on the same freedoms.