disk-forensic 0.4.0

Forensic disk partitioning-scheme orchestrator — auto-detects MBR/GPT/APM and dispatches to the right parser
Documentation
# Plan: grow `disk-forensic` into a full container → partition → filesystem → report pipeline

> Authored 2026-06-06 by the Claude session working on `iso9660-forensic`, for the
> Claude session developing `disk-forensic`. Self-contained; no shared chat context assumed.

## Goal

Today `disk-forensic` is a **partition-scheme** orchestrator: `analyse_disk(reader, size) -> DiskReport`
(MBR / GPT / APM, via `mbr-forensic` / `gpt-forensic` / `apm-forensic` + `forensicnomicon`).
It stops at the partition table — it does **not** open image containers and does **not** descend
into filesystems.

The target is a single entry point that accepts **any** image — `.E01`, `.vmdk`, `.vhdx`, `.vhd`,
`.qcow2`, `.dmg`, `.aff4`, raw `dd`, **and optical `.iso` / `.mds` / `.cue` / `.nrg` / `.ccd` /
`.cdi` / `.toc`** — and emits one **very verbose, examiner-grade forensic report** covering every
layer it can reach.

## Layered architecture (the whole collection already fits this)

```mermaid
flowchart TD
    IN["input image (any format)"]
    OPEN["open_any(path) -> Box dyn Read+Seek   [container detect + unwrap]"]
    PART["partition detect: MBR / GPT / APM   (existing analyse_disk)"]
    OPT["optical: bare ISO/UDF (no partition table); Mac hybrid = APM; El Torito = nested FAT"]
    FS["filesystem detect per partition / per volume"]
    AFS["filesystem .analyse(reader) -> FsAnalysis   (iso9660 / ntfs / ext4 / hfsplus / udf)"]
    APART["partition .analyse(reader) -> Analysis   (gpt / mbr / apm)"]
    AGG["aggregate findings + merge super-timeline + cross-layer correlation"]
    REP["render: text + JSON + DFXML + HTML"]
    IN --> OPEN --> PART
    OPEN --> OPT
    PART --> FS
    OPT --> FS
    PART --> APART
    FS --> AFS
    APART --> AGG
    AFS --> AGG
    AGG --> REP
```

Roles already established in `~/src`:
- **Container readers** (`ewf`, `vmdk`, `vhdx`, `vhd`, `qcow2`, `dmg`, `aff4`, `dd`): each exposes
  `std::io::Read + Seek` over the decoded raw image. Uniform; compose freely.
- **Optical container readers** live *inside* `iso9660-forensic` (`open_reader` resolves
  `.cue/.ccd/.nrg/.mds/.cdi/.toc` to the data track). Ask that crate to promote them to a public
  `iso9660_forensic::open(path) -> impl Read+Seek` (tracked on its side).
- **Partition readers** (`mbr/gpt/apm-forensic`): `analyse(reader, …) -> XxxAnalysis { anomalies, … }`.
- **Filesystem readers** (`iso9660/ntfs/ext4fs/hfsplus/udf-forensic`): two surfaces —
  `ForensicFs` (navigation/mount, consumed by `4n6mount`) **and now** `analyse(reader) -> FsAnalysis`
  (batch findings, consumed by *you*).

## The sibling analyzer contract (now real in `iso9660-forensic`)

`iso9660-forensic` just gained the `gpt-forensic`-shaped contract you should consume and that the
other filesystem crates should mirror:

```rust
// iso9660_forensic
pub fn analyse<R: Read + Seek>(reader: &mut R) -> Result<IsoAnalysis, IsoError>;
pub fn analyse_with_options<R: Read + Seek>(reader: &mut R, opts: AnalyseOptions) -> Result<IsoAnalysis, IsoError>;

pub struct IsoAnalysis {
    pub volume: IsoVolumeInfo,     // provenance: tool fingerprints, timestamps, extension flags, sessions
    pub anomalies: Vec<findings::Anomaly>,
}
impl IsoAnalysis { pub fn max_severity(&self) -> Option<Severity>; }

// iso9660_forensic::findings  (mirrors gpt-forensic / mbr-forensic / apm-forensic exactly)
pub enum Severity { Info, Low, Medium, High, Critical }   // Ord
pub enum AnomalyKind { BothEndianMismatch { context, field, byte_offset, le, be }, /* …growing… */ }
pub struct Anomaly { pub severity, pub code: &'static str, pub kind: AnomalyKind, pub note: String }
// serde::Serialize behind the `serde` feature on every type.
```

## Concrete work items (suggested order)

1. **`open_any(path) -> Result<Box<dyn ReadSeek>>`** front-end: detect + unwrap the container by
   magic/extension. E01→`ewf::EwfReader`, VMDK→`vmdk`, VHDX→`vhdx`, VHD→`vhd`, QCOW2→`qcow2`,
   DMG→`dmg`, AFF4→`aff4`, raw→`dd`/`File`; optical (`.iso/.mds/.cue/.nrg/.ccd/.cdi/.toc`)→
   `iso9660_forensic::open`. This is what makes "feed it an E01 or an MDS" real.
2. **Filesystem stage**: after the partition stage (or directly, for bare optical volumes), detect the
   filesystem per partition (reuse `4n6mount`'s `detect::detect_filesystem` if suitable) and call the
   matching crate's `analyse(reader)`. Window each partition with an offset-reader so the fs crate
   sees its own volume starting at 0.
3. **Aggregate report type** that holds: the container/acquisition metadata, the `DiskReport`
   (partitions), and a `Vec` of per-volume `FsAnalysis`, plus a merged **super-timeline**.
4. **Renderers**: verbose text (default), JSON (serde — already the pattern), and ideally **DFXML**
   and an **HTML** report. Lead each section with provenance (observed facts), then anomalies ranked
   by severity.
5. **Cross-layer correlation** (high value): e.g. compare EWF acquisition timestamps against
   filesystem volume/file timestamps; flag a filesystem newer than its acquisition.

## Open decisions (please choose — they affect every crate)

- **Shared finding schema.** Today `Severity`/`Anomaly`/`AnomalyKind` are **copy-pasted** across
  `mbr/gpt/apm-forensic` and now `iso9660-forensic`. For a *uniform* verbose report across 3 partition
  schemes + 6 filesystems + containers, strongly consider extracting a tiny **`forensic-core`** crate
  (or hosting the schema in the existing `forensicnomicon`) defining `Severity`, `Finding`,
  `Evidence`, `TimelineEvent`, `Report`. Otherwise `disk-forensic` must normalize N bespoke
  `XxxAnalysis` types. (Recommendation: shared crate.)
- **Who owns container detection**`disk-forensic` directly, or a new thin `disk4n6` binary crate
  that composes `open_any` + `disk-forensic` + the filesystem analyzers.

## What a "maximally verbose" report should surface (drives `AnomalyKind` growth)

The engine is **redundancy + slack**: enumerate every redundant copy and diff it; carve every
non-file byte. Distinguish three epistemic layers in the output — **observed fact** → **"consistent
with" inference** → leave conclusions to the examiner (never assert intent). Present a **benign and a
suspicious reading** for each finding (most history is innocent).

Per-layer breadcrumbs to mine (each becomes a `Finding`/`AnomalyKind` in the relevant crate):
- **Provenance / tool fingerprint**: PVD identifier strings + versions, padding/fill signatures,
  system-area contents, container/acquisition metadata (examiner, drive, tool version).
- **Cross-redundancy disagreement** (primary tamper detector): both-endian fields, path-table vs
  directory tree, L vs M path table, primary vs Joliet tree, primary vs backup GPT, multisession PVDs.
- **Temporal**: ISO dir time vs Rock Ridge `TF` (7 POSIX times) vs Joliet; epoch leaks, mixed
  timezones (multiple authoring envs), clustering, future dates → merge into a **super-timeline**.
- **Slack & unused space**: file slack (leaked buffer/RAM), unallocated sectors, **post-`volume_space_size`
  appended payload**, raw-sector **EDC/ECC validity** (genuine dump vs synthesized image).
- **Multisession history**: deleted/replaced files recoverable in earlier sessions; per-session
  burn timeline.
- **Identity intel**: Rock Ridge `PX` uid/gid (authoring account) + inode patterns, `SL` symlink
  targets (leak source machine paths), three-name divergence (ISO/Joliet/`NM`), version suffixes.
- **Boot/executables**: El Torito platform IDs + boot-image hash; embedded PE/ELF (hand off to
  `exec-pe-forensic`).
- **Structural attacks**: overlapping/out-of-bounds extents, path traversal in names, cyclic dirs,
  non-zero reserved fields.

## Conventions (collection-wide)

- **Strict TDD** (separate RED then GREEN commits) and **validate against real data**, not synthetic
  fixtures (generate via `xorriso` / `Aaru` / `hdiutil`, or commit small real images).
- Mirror the existing per-crate `Anomaly { severity, code, kind, note }` schema (derive
  severity/code/note from `kind` so they cannot drift).
- `serde` behind a feature flag on all public output types.

## Status of the `iso9660-forensic` side

**The analyzer data structure is READY to consume — there is NO analysis CLI in
`iso9660-forensic` by design.** That crate is a library returning `IsoAnalysis`; `disk-forensic`
(disk4n6) owns the CLI and renders the report. Consume the typed struct directly:

```rust
let analysis: iso9660_forensic::IsoAnalysis = iso9660_forensic::analyse(&mut reader)?;
// IsoAnalysis { volume: IsoVolumeInfo, anomalies: Vec<findings::Anomaly> } + .max_severity()
// IsoVolumeInfo: volume_label, system_id, volume_set_id, publisher_id,
//   data_preparer_id (mastering-tool fingerprint), application_id, creation_time,
//   modification_time, sector_mode, session_count, has_rock_ridge, has_joliet,
//   has_enhanced_volume_descriptor.
// Anomaly { severity: Severity{Info,Low,Medium,High,Critical}, code: &'static str,
//           kind: AnomalyKind, note: String }  -- group by severity, dedupe by code.
// All public; every type derives serde::Serialize behind the `serde` feature
// (Serialize only -- you hold the struct from analyse(), no round-trip needed).
```

- DONE: `analyse()` shipped on `main` with **23 finding codes**, each deriving severity/code/note
  from its `kind` (same model as gpt/mbr/apm-forensic) so your renderer treats every layer's
  findings uniformly. Group by severity, dedupe by code:
  - **Cross-redundancy (tamper):** ISO-BOTH-ENDIAN (High), ISO-PATHTABLE-ENDIAN (High, L↔M table),
    ISO-PATHTABLE-DIVERGENCE (Medium phantom / High ghost, table↔tree), ISO-TREE-DIVERGENCE (High,
    primary↔Joliet).
  - **Slack & appended:** ISO-SLACK-DATA (Low), ISO-TRAILING-DATA (Medium), ISO-PRESYS-DATA
    (Low/Medium), ISO-RESERVED-DATA (Low, non-zero PVD reserved fields).
  - **Structural:** ISO-OOB-EXTENT (High), ISO-OVERLAP-EXTENT (High), ISO-DIR-CYCLE (High),
    ISO-ORPHAN-FILE (Medium), ISO-SYMLINK (Low absolute / High traversal).
  - **Temporal:** ISO-TIME-AFTER-VOL (Medium), ISO-MIXED-TZ (Low), ISO-TIME-IMPLAUSIBLE (Medium,
    < 1985 or > 2100).
  - **History:** ISO-SUPERSEDED-FILE (Medium, recoverable deleted/replaced content across sessions).
  - **Concealment / authenticity:** ISO-NAME-DIVERGENCE (High, Rock Ridge vs Joliet long name
    disagree for the same file — OS-specific filename concealment), ISO-DISGUISED-EXEC (High, a
    document/media-extension file whose content is a PE/ELF/Mach-O executable — hand the executable
    itself to a PE/ELF analyzer), ISO-EDC-INVALID / ISO-ECC-INVALID (Medium, raw 2352 Mode-1 sectors
    with invalid/zero EDC or Reed-Solomon P/Q ECC — a synthesized/repackaged image rather than a
    faithful drive dump, or tampered data; ECC additionally catches tampering where EDC was
    recomputed but ECC was not).
  - **Temporal (per-file):** ISO-TIME-MISMATCH (Medium, ISO directory recorded time vs Rock Ridge TF
    modify time disagree — an edited stamp).
  - **Authoring oddity:** ISO-FILE-VERSION (Low, a name version suffix other than ;1).
  - Bonus public helpers: `iso9660_forensic::sector::cd_edc(&[u8]) -> u32` and
    `mode1_ecc_valid(&[u8]) -> bool` / `cd_ecc_stamp(&mut [u8])` (CD-ROM Mode-1 EDC + Reed-Solomon
    P/Q ECC) for verifying dump authenticity directly.
- DONE: `IsoVolumeInfo` provenance now also carries `boot_entries: Vec<BootRecord { platform,
  bootable, load_lba, sectors, sha256 }>` (El Torito — BIOS/UEFI boot capability, boot-image LBA,
  and the boot image's SHA-256 for matching known-malicious images), `rock_ridge_uids` /
  `rock_ridge_gids` / `rock_ridge_inodes` (PX authoring-account + inode identity), and
  `earliest_file_time` / `latest_file_time` (the authoring-time window — feed these into your
  super-timeline). `BootRecord` is re-exported from the crate root.
- DONE: `analyse()` and `IsoReader::walk()` are crash-resistant on corrupt/truncated images —
  out-of-bounds extents (file *and* directory) and directory cycles are surveyed and reported as
  findings rather than erroring out (EOF-tolerant audits + cycle-safe traversal).
- DONE: `iso9660_forensic::open(path) -> Result<Box<dyn ReadSeek>, IsoError>` is now public — it
  resolves a raw `.iso` or a `.cue`/`.ccd`/`.nrg`/`.mds`/`.toc` container to a `Read + Seek` over the
  ISO 9660 data track. Wire it straight into your `open_any` for the optical branch; the returned
  `Box<dyn ReadSeek>` already implements `Read + Seek`, so feed it to `analyse(&mut src)` or
  `IsoReader::open(src)` directly. `ReadSeek` is also re-exported if you want the trait.
- NOTE: the standalone `iso9660-cli` binary has been **removed**`iso9660-forensic` is now
  library-only and `disk4n6` is the single CLI for the whole collection. Nothing was lost; the only
  library-worthy capability (container resolution) is the `open()` above.
- Boundary: `iso9660-forensic` reads **ISO 9660 + optical layers only**; UDF/HFS+/APM are their own
  sibling crates you compose, each exposing (or to expose) the same `analyse()->Analysis` shape.