Expand description
True/destructive redaction + document sanitization (#231). True / destructive redaction and document sanitization (#231).
Replaces the prior cosmetic redaction (a filled rectangle drawn over content whose underlying bytes survived) with physical content removal and a document-wide sanitization pass, per ISO 32000-1:2008 §12.5.6.23: “shall remove all traces of the specified content … clipping or image masks shall not be used to hide that data.”
The capability is built incrementally per the feature plan tracked in https://github.com/yfedoseev/pdf_oxide/issues/231. One responsibility per submodule (SRP); the geometric region model lands first and is the shared input to every pruner. The pruners (text/image/path/xobject), the font scrubber, the sanitizer and the orchestrating engine follow as subsequent submodules.
Re-exports§
pub use classify::Classification;pub use engine::redact_content_stream;pub use engine::FontInfoMetrics;pub use options::OcgPolicy;pub use options::RedactionOptions;pub use options::RedactionReport;pub use region::RedactionRegion;pub use region::RegionSet;pub use region::DEFAULT_EDGE_PADDING;pub use sanitize::sanitize_catalog;pub use sanitize::CatalogScrub;pub use sanitize::SanitizeCounts;
Modules§
- classify
- Classify content marks against redaction regions in page space (#231, T3).
- engine
- Destructive-redaction orchestration for a single content stream (#231, T11 — the parse → prune → re-serialize → overlay pipeline that turns the pure primitives into a working redaction of real bytes).
- font_
scrub - Font scrubbing for destructive redaction (#231, T9 — completes guarantee G2: no width/shift side channel).
- image_
prune - Image-placement redaction planning for destructive redaction (#231, T6 core / G3).
- options
- Public configuration and reporting types for destructive redaction (#231, feature plan §5.1).
- overlay
- Redaction overlay content-stream generation (#231, T13 — guarantee G7: an opaque mark is the only thing drawn where content was removed).
- path_
prune - Vector-path geometry primitives for destructive redaction (#231, T7).
- region
- Geometric model for redaction regions (ISO 32000-1:2008 §12.5.6.23).
- sanitize
- Standalone document sanitization (#231 T10) — the catalog-scrub
decision layer (feature-231 §4.6 / §5.1
sanitize_document). - serialize
- Content-stream re-serialization for destructive redaction (#231, T1).
- text_
engine - Content-stream text-redaction engine (#231, T4 + the text path of T11 — guarantees G1 “no recoverable text” and G2 “no width/shift side channel”).
- text_
prune - Text-run pruning for destructive redaction (#231, T5 — guarantees G1 “no recoverable text” and G2 “no width/shift side channel”).