Skip to main content

Module bundle

Module bundle 

Source
Expand description

SafeBundle generation: OCR + Gaze redact → on-disk artifacts.

The top-level clean function is the public adopter entry point. It routes any supported input (PNG / JPG / single-page PDF) through OCR, pipes the extracted text through a gaze::Pipeline, and persists the result as three files in a target directory:

out/
  clean.md        # OCR text with PII replaced by reversible tokens
  manifest.json   # gaze::Manifest — restorable, canonical
  report.json     # BundleReport — OCR + PII counts + provenance

The manifest contract is the same one the rest of the gaze runtime uses (gaze::Manifest). Adopters can pair clean.md with manifest.json and restore via the standard gaze session APIs.

Structs§

BundleReport
Bundle audit + provenance report serialized to report.json.
ClassCount
Per-class PII detection count for BundleReport.
LayoutSummary
Opaque layout summary placeholder.
SafeBundle
Post-ingestion artifact paired with a Gaze Manifest.

Constants§

BUNDLE_VERSION
Versioned report.json schema tag (bump on breaking shape changes).
CLEAN_MARKDOWN_FILE
Bundle filename written into --out for tokenized Markdown.
MANIFEST_FILE
Bundle filename written into --out for the restorable manifest.
REPORT_FILE
Bundle filename written into --out for the OCR + PII provenance report.

Functions§

cleanocr-tesseract
Top-level entry point: ingest one document, write a SafeBundle to disk.