vigb-decoder 0.1.0

Decoder for PaperPort 2 .max (ViGBe) image scans
Documentation
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.1.0] — 2026-06-09

### Added
- First public release.
- Library crate `vigb_decoder` with `decode_max`, `decode_max_file`,
  `write_pdf`, `Config`, `Page`, `MaxError`. `Config`, `Page`, and
  `Preview` are `#[non_exhaustive]`; construct `Config` via
  `Config::default()` / `Config::builder()` and obtain `Page` /
  `Preview` from the decoder functions.
- `vigb-max2pdf` binary with CLI flag parity vs the Python reference
  decoder (canonical fixes ON by default, diagnostic flags opt-in).
  Rust-only additions: `--stats` and `--max-pages`.
- Pure-Python sibling implementation
  (`python-reference/vigb_max2pdf.py`) — same algorithm, same
  canonical bit-perfect output, same CLI flags.
- Per-line CCITT-T.6 decoder with the bug4 canonical reference-table
  walk and lazy-bit-loading toggle.
- Preview thumbnail decoder (102×146 RLE → upscaled 1-bit), off by
  default; `--preview` appends the thumbnail as an extra PDF page per
  source page (fallback when the main CCITT decode fails on
  hand-drawn content or stamps).
- Hand-written PDF writer (no PDF crate dependency).
- Smart-resync state machine (`fail_resync_max` / `lookahead` /
  `min_confidence` / `budget`).
- CCITT-T.6 lookup tables derived from CCITT T.6 (1988) +
  TIFF 6.0 (Aldus, 1992) — clean-room from public standards.
- Local-only corpus regression test (`cargo test --features corpus`)
  that pixel-compares the Rust decoder's output against reference
  PDFs produced by `python-reference/vigb_max2pdf.py` over a private
  archive.

### Security
- Hardening against adversarial `.max` input, designed in from the
  start:
  - Image chunks shorter than `IMAGE_CHUNK_MIN_LEN` (`0x42`) are
    rejected at discovery time; preview decoding bails safely when
    `preview_size > chunk_length` would otherwise underflow
    (CRIT-01, SEC-H01).
  - `width × height` capped at `MAX_IMAGE_PIXELS` (200 MP) and
    `padded_x × preview_y` at `MAX_PREVIEW_PIXELS` (16 MP);
    `checked_mul` on the bitmap byte-count for 32-bit safety.
    Returns `MaxError::ImageTooLarge` instead of allocating ~537 MB /
    ~4 GB from a 64-byte malicious header (CRIT-02, SEC-H02).
  - `Config::fail_resync_max`, `fail_resync_lookahead`, and
    `fail_resync_budget` clamped at use site to safe upper bounds;
    `fail_resync_budget == 0` means "use the default cap of 1024"
    (SEC-M02; the Python reference keeps 0 = unlimited).
  - Per-chunk dispatcher bounded to `chunk_start + chunk_length`,
    closing a quadratic-CPU vector via crafted files packing many
    minimum-size image chunks back-to-back. Mirrored in the Python
    reference (SEC-M03).
  - `Config::max_pages` (default 1024, CLI `--max-pages`) caps the
    per-file image-chunk count, bounding resident memory on crafted
    many-chunk files; `MaxError::TooManyPages` (SEC-M04, Rust-only).
- The CLI's output-path trust model (it honours `-o` verbatim,
  including `..` traversal) is documented in `docs/cli.md` together
  with a canonicalize-and-contain recipe for service operators
  wrapping the binary on untrusted input.

### Verified
- Bit-perfect against PaperPort 3.6 (run under `otvdm`) on the
  author's 159-page private test corpus (median IoU = 1.000);
  pixel-identical to the Python reference, n_fail=0 across the board.
- ~4× faster end-to-end than the Python reference (38 ms/page vs
  151 ms/page on 2464×3508 scans).

[Unreleased]: https://github.com/christian-regg/vigb-decoder/compare/v0.1.0...HEAD
[0.1.0]: https://github.com/christian-regg/vigb-decoder/releases/tag/v0.1.0