Expand description
SafeBundle generation: OCR + Gaze redact → on-disk artifacts.
The top-level clean function is the public adopter entry point. It
routes any supported input (PNG / JPG / single-page PDF) through OCR,
pipes the extracted text through a gaze::Pipeline, and persists the
result as three files in a target directory:
out/
clean.md # OCR text with PII replaced by reversible tokens
manifest.json # gaze::Manifest — restorable, canonical
report.json # BundleReport — OCR + PII counts + provenanceThe manifest contract is the same one the rest of the gaze runtime
uses (gaze::Manifest). Adopters can pair clean.md with manifest.json
and restore via the standard gaze session APIs.
Structs§
- Bundle
Report - Bundle audit + provenance report serialized to
report.json. - Class
Count - Per-class PII detection count for
BundleReport. - Layout
Summary - Opaque layout summary placeholder.
- Safe
Bundle - Post-ingestion artifact paired with a Gaze
Manifest.
Constants§
- BUNDLE_
VERSION - Versioned
report.jsonschema tag (bump on breaking shape changes). - CLEAN_
MARKDOWN_ FILE - Bundle filename written into
--outfor tokenized Markdown. - MANIFEST_
FILE - Bundle filename written into
--outfor the restorable manifest. - REPORT_
FILE - Bundle filename written into
--outfor the OCR + PII provenance report.
Functions§
- clean
ocr-tesseract - Top-level entry point: ingest one document, write a
SafeBundleto disk.