vigb-decoder
Decoder for PaperPort 2 (.max) image scans.
Why this exists
The PaperPort 2 file format ("ViGBe") is dead. The only way to open these files used to be PaperPort 3.6 — a 1996 Windows app that doesn't run natively on modern Windows and isn't supported by any current PaperPort version (Tungsten / Kofax / Nuance — see Nuance KB 1473).
vigb-decoder is the first known tool that decodes PaperPort 2 era
files. The closest existing OSS projects (paperman
and max2pdf) explicitly do
not support this format.
Install
cargo install vigb-decoder
This installs the vigb-max2pdf binary in ~/.cargo/bin/.
Pre-built binaries for Linux x86_64, Windows x86_64, and macOS aarch64 are attached to each release.
Use
Convert a single file:
vigb-max2pdf scan.max
Convert a batch into a directory:
vigb-max2pdf -o out/ *.max
Print per-file decode stats:
vigb-max2pdf --stats scan.max
Each .max page also has an embedded 102×146 grayscale preview
thumbnail. By default the converter ignores it (the main bit-perfect
image is what you want). Pass --preview to append the thumbnail as
an extra PDF page per source page — useful as a fallback when the
main decode fails on hand-drawn content or stamps:
vigb-max2pdf --preview scan.max
See docs/cli.md for the full flag list.
Pure-Python alternative
If you can't install Rust, a pure-Python sibling implementation lives
at python-reference/vigb_max2pdf.py.
Same algorithm, same canonical bit-perfect output, ~4× slower. Same
CLI flags. Same MIT/Apache-2.0 license.
python python-reference/vigb_max2pdf.py scan.max -o out/
Library use
use ;
use Path;
Page::bitmap is 1-bit packed, MSB-first per byte. Bit value 1 means
BLACK (matches the PDF /Indexed [/DeviceGray 1 <FF 00>] convention).
Status
Bit-perfect against the canonical PaperPort 3.6 reference on every file we have ground truth for. Median IoU = 1.000 across a 159-page test corpus (private — the test corpus is the author's personal document archive).
Format reverse-engineering
See docs/format.md for the file structure and
docs/decoder.md for the canonical decoder behaviour
(including the four canonical fixes the decoder implements).
Reverse-engineering legal basis
This decoder was reverse-engineered for interoperability under:
- Switzerland: URG Art. 21 (decompilation for interface info).
- EU: Software Directive 2009/24/EC Art. 6.
- US: DMCA §1201(f) safe harbour and Sega v. Accolade, 977 F.2d 1510 (9th Cir. 1992).
The decoder ships zero bytes from PaperPort. CCITT-T.6 lookup tables
are derived from the CCITT T.6 Recommendation
(1988, a public standard) cross-checked against the TIFF 6.0
Specification (1992, public domain); format dispatch logic was
developed against bit-traces of the author's own .max files
cross-checked against the disassembly of ScanSoft's MAXKER2.DLL
(extracted from the publicly distributed Visioneer 5.2 installer ISO,
archive.org, 2020).
See docs/provenance.md for component-level
clean-room separation notes.
Credits
- PaperPort 3.6 (ScanSoft, 1996) — bridge that made the RE possible.
- CCITT T.6 (1988) + TIFF 6.0 (Aldus, 1992) — source for CCITT Group 4 table values.
- paperman (Simon Glass) and max2pdf (orangeturtle739) — prior-art OSS projects, used as cross-checks during RE. Both GPL-2-or-later; no code is copied from either project.
- otvdm (otya128) — runs the PP3.6 bridge under modern Windows.
License
Licensed under either of:
at your option.
PaperPort is a trademark of its respective owner (Tungsten Automation, formerly Kofax / Nuance / ScanSoft). This project is independent and not affiliated with or endorsed by the trademark holder.