Skip to main content

Module redaction

Module redaction 

Source
Expand description

True/destructive redaction + document sanitization (#231). True / destructive redaction and document sanitization (#231).

Replaces the prior cosmetic redaction (a filled rectangle drawn over content whose underlying bytes survived) with physical content removal and a document-wide sanitization pass, per ISO 32000-1:2008 §12.5.6.23: “shall remove all traces of the specified content … clipping or image masks shall not be used to hide that data.”

The capability is built incrementally per the feature plan tracked in https://github.com/yfedoseev/pdf_oxide/issues/231. One responsibility per submodule (SRP); the geometric region model lands first and is the shared input to every pruner. The pruners (text/image/path/xobject), the font scrubber, the sanitizer and the orchestrating engine follow as subsequent submodules.

Re-exports§

pub use classify::Classification;
pub use engine::redact_content_stream;
pub use engine::FontInfoMetrics;
pub use options::OcgPolicy;
pub use options::RedactionOptions;
pub use options::RedactionReport;
pub use region::RedactionRegion;
pub use region::RegionSet;
pub use region::DEFAULT_EDGE_PADDING;
pub use sanitize::sanitize_catalog;
pub use sanitize::CatalogScrub;
pub use sanitize::SanitizeCounts;

Modules§

classify
Classify content marks against redaction regions in page space (#231, T3).
engine
Destructive-redaction orchestration for a single content stream (#231, T11 — the parse → prune → re-serialize → overlay pipeline that turns the pure primitives into a working redaction of real bytes).
font_scrub
Font scrubbing for destructive redaction (#231, T9 — completes guarantee G2: no width/shift side channel).
image_prune
Image-placement redaction planning for destructive redaction (#231, T6 core / G3).
options
Public configuration and reporting types for destructive redaction (#231, feature plan §5.1).
overlay
Redaction overlay content-stream generation (#231, T13 — guarantee G7: an opaque mark is the only thing drawn where content was removed).
path_prune
Vector-path geometry primitives for destructive redaction (#231, T7).
region
Geometric model for redaction regions (ISO 32000-1:2008 §12.5.6.23).
sanitize
Standalone document sanitization (#231 T10) — the catalog-scrub decision layer (feature-231 §4.6 / §5.1 sanitize_document).
serialize
Content-stream re-serialization for destructive redaction (#231, T1).
text_engine
Content-stream text-redaction engine (#231, T4 + the text path of T11 — guarantees G1 “no recoverable text” and G2 “no width/shift side channel”).
text_prune
Text-run pruning for destructive redaction (#231, T5 — guarantees G1 “no recoverable text” and G2 “no width/shift side channel”).