vigb-decoder 0.1.0

Decoder for PaperPort 2 .max (ViGBe) image scans
Documentation

vigb-decoder

ci crates.io docs.rs

Decoder for PaperPort 2 (.max) image scans.

Why this exists

The PaperPort 2 file format ("ViGBe") is dead. The only way to open these files used to be PaperPort 3.6 — a 1996 Windows app that doesn't run natively on modern Windows and isn't supported by any current PaperPort version (Tungsten / Kofax / Nuance — see Nuance KB 1473).

vigb-decoder is the first known tool that decodes PaperPort 2 era files. The closest existing OSS projects (paperman and max2pdf) explicitly do not support this format.

Install

cargo install vigb-decoder

This installs the vigb-max2pdf binary in ~/.cargo/bin/.

Pre-built binaries for Linux x86_64, Windows x86_64, and macOS aarch64 are attached to each release.

Use

Convert a single file:

vigb-max2pdf scan.max

Convert a batch into a directory:

vigb-max2pdf -o out/ *.max

Print per-file decode stats:

vigb-max2pdf --stats scan.max

Each .max page also has an embedded 102×146 grayscale preview thumbnail. By default the converter ignores it (the main bit-perfect image is what you want). Pass --preview to append the thumbnail as an extra PDF page per source page — useful as a fallback when the main decode fails on hand-drawn content or stamps:

vigb-max2pdf --preview scan.max

See docs/cli.md for the full flag list.

Pure-Python alternative

If you can't install Rust, a pure-Python sibling implementation lives at python-reference/vigb_max2pdf.py. Same algorithm, same canonical bit-perfect output, ~4× slower. Same CLI flags. Same MIT/Apache-2.0 license.

python python-reference/vigb_max2pdf.py scan.max -o out/

Library use

use vigb_decoder::{decode_max_file, write_pdf, Config, MaxError};
use std::path::Path;

fn main() -> Result<(), MaxError> {
    let pages = decode_max_file("scan.max", &Config::default())?;
    write_pdf(&pages, Path::new("scan.pdf"))?;
    Ok(())
}

Page::bitmap is 1-bit packed, MSB-first per byte. Bit value 1 means BLACK (matches the PDF /Indexed [/DeviceGray 1 <FF 00>] convention).

Status

Bit-perfect against the canonical PaperPort 3.6 reference on every file we have ground truth for. Median IoU = 1.000 across a 159-page test corpus (private — the test corpus is the author's personal document archive).

Format reverse-engineering

See docs/format.md for the file structure and docs/decoder.md for the canonical decoder behaviour (including the four canonical fixes the decoder implements).

Reverse-engineering legal basis

This decoder was reverse-engineered for interoperability under:

The decoder ships zero bytes from PaperPort. CCITT-T.6 lookup tables are derived from the CCITT T.6 Recommendation (1988, a public standard) cross-checked against the TIFF 6.0 Specification (1992, public domain); format dispatch logic was developed against bit-traces of the author's own .max files cross-checked against the disassembly of ScanSoft's MAXKER2.DLL (extracted from the publicly distributed Visioneer 5.2 installer ISO, archive.org, 2020).

See docs/provenance.md for component-level clean-room separation notes.

Credits

  • PaperPort 3.6 (ScanSoft, 1996) — bridge that made the RE possible.
  • CCITT T.6 (1988) + TIFF 6.0 (Aldus, 1992) — source for CCITT Group 4 table values.
  • paperman (Simon Glass) and max2pdf (orangeturtle739) — prior-art OSS projects, used as cross-checks during RE. Both GPL-2-or-later; no code is copied from either project.
  • otvdm (otya128) — runs the PP3.6 bridge under modern Windows.

License

Licensed under either of:

at your option.

PaperPort is a trademark of its respective owner (Tungsten Automation, formerly Kofax / Nuance / ScanSoft). This project is independent and not affiliated with or endorsed by the trademark holder.