pub struct ArchiveProcessor { /* private fields */ }Expand description
Processes archives by sanitizing each contained file and rebuilding the archive with the same format and preserved metadata.
§Usage
use sanitize_engine::processor::archive::{ArchiveProcessor, ArchiveFormat};
use sanitize_engine::processor::registry::ProcessorRegistry;
use sanitize_engine::scanner::{StreamScanner, ScanPattern, ScanConfig};
use sanitize_engine::generator::HmacGenerator;
use sanitize_engine::store::MappingStore;
use sanitize_engine::category::Category;
use std::sync::Arc;
let gen = Arc::new(HmacGenerator::new([42u8; 32]));
let store = Arc::new(MappingStore::new(gen, None));
let patterns = vec![
ScanPattern::from_regex(r"secret\w+", Category::Custom("secret".into()), "secrets").unwrap(),
];
let scanner = Arc::new(
StreamScanner::new(patterns, Arc::clone(&store), ScanConfig::default()).unwrap(),
);
let registry = Arc::new(ProcessorRegistry::with_builtins());
let archive_proc = ArchiveProcessor::new(registry, scanner, store, vec![]);Implementations§
Source§impl ArchiveProcessor
impl ArchiveProcessor
Sourcepub fn new(
registry: Arc<ProcessorRegistry>,
scanner: Arc<StreamScanner>,
store: Arc<MappingStore>,
profiles: Vec<FileTypeProfile>,
) -> Self
pub fn new( registry: Arc<ProcessorRegistry>, scanner: Arc<StreamScanner>, store: Arc<MappingStore>, profiles: Vec<FileTypeProfile>, ) -> Self
Create a new archive processor.
§Arguments
registry— structured processor registry.scanner— streaming scanner for fallback.store— shared mapping store for one-way dedup replacements.profiles— file-type profiles for structured matching.
Sourcepub fn with_max_depth(self, depth: u32) -> Self
pub fn with_max_depth(self, depth: u32) -> Self
Override the maximum nesting depth for recursive archive processing.
The default is DEFAULT_MAX_ARCHIVE_DEPTH (3). Values above
10 are clamped.
Sourcepub fn with_parallel_threshold(self, threshold: usize) -> Self
pub fn with_parallel_threshold(self, threshold: usize) -> Self
Override the minimum entry count required to enable parallel
entry sanitization. Set to usize::MAX to disable parallelism
entirely for this processor instance (e.g. when outer file-level
parallelism is already saturating the thread budget).
Sourcepub fn with_progress_callback(
self,
callback: Arc<dyn Fn(&ArchiveProgress) + Send + Sync>,
) -> Self
pub fn with_progress_callback( self, callback: Arc<dyn Fn(&ArchiveProgress) + Send + Sync>, ) -> Self
Register a per-entry archive progress callback.
Sourcepub fn with_filter(self, filter: ArchiveFilter) -> Self
pub fn with_filter(self, filter: ArchiveFilter) -> Self
Apply an ArchiveFilter that controls which file entries are
included in the output archive.
Entries that do not pass the filter are removed from the output entirely. Directory / symlink entries are never filtered.
Sourcepub fn with_force_text(self, force_text: bool) -> Self
pub fn with_force_text(self, force_text: bool) -> Self
When set, bypass all structured processors and use only the streaming scanner for every archive entry.
Trades format preservation for maximum sanitization coverage. Useful when the user is uncertain about field rules or wants a belt-and-suspenders guarantee that every byte is scanned.
Sourcepub fn discover_profiles_tar<R: Read>(&self, reader: R) -> Result<()>
pub fn discover_profiles_tar<R: Read>(&self, reader: R) -> Result<()>
Run the structured processor on every profile-matched entry in a
.tar archive, recording replacements into the store. Output is
discarded; the archive is not modified.
Sourcepub fn discover_profiles_tar_gz<R: Read>(&self, reader: R) -> Result<()>
pub fn discover_profiles_tar_gz<R: Read>(&self, reader: R) -> Result<()>
Run the structured processor on every profile-matched entry in a
.tar.gz archive, recording replacements into the store. Output is
discarded; the archive is not modified.
Sourcepub fn discover_profiles_zip<R: Read + Seek>(&self, reader: R) -> Result<()>
pub fn discover_profiles_zip<R: Read + Seek>(&self, reader: R) -> Result<()>
Run the structured processor on every profile-matched entry in a
.zip archive, recording replacements into the store. Output is
discarded; the archive is not modified.
Sourcepub fn process_tar<R: Read, W: Write>(
&self,
reader: R,
writer: W,
) -> Result<ArchiveStats>
pub fn process_tar<R: Read, W: Write>( &self, reader: R, writer: W, ) -> Result<ArchiveStats>
Process a .tar archive, sanitizing each file entry and
rebuilding the archive with preserved metadata.
Entries that are not regular files (directories, symlinks, etc.) are copied through unchanged.
§Errors
Returns SanitizeError::ArchiveError on I/O failures or
SanitizeError::RecursionDepthExceeded for nested archives.
Sourcepub fn process_tar_gz<R: Read, W: Write>(
&self,
reader: R,
writer: W,
) -> Result<ArchiveStats>
pub fn process_tar_gz<R: Read, W: Write>( &self, reader: R, writer: W, ) -> Result<ArchiveStats>
Process a .tar.gz archive (gzip-compressed tar).
Decompresses on the fly, processes each entry, and recompresses the output.
§Errors
Returns SanitizeError::ArchiveError on I/O failures or
SanitizeError::RecursionDepthExceeded for nested archives.
Sourcepub fn process_zip<R: Read + Seek, W: Write + Seek>(
&self,
reader: R,
writer: W,
) -> Result<ArchiveStats>
pub fn process_zip<R: Read + Seek, W: Write + Seek>( &self, reader: R, writer: W, ) -> Result<ArchiveStats>
Process a .zip archive, sanitizing each file entry and
rebuilding the archive with preserved metadata.
§Type Bounds
Zip requires seekable I/O for both reading and writing.
§Errors
Returns SanitizeError::ArchiveError on I/O failures or
SanitizeError::RecursionDepthExceeded for nested archives.
Sourcepub fn process<R: Read + Seek, W: Write + Seek>(
&self,
reader: R,
writer: W,
format: ArchiveFormat,
) -> Result<ArchiveStats>
pub fn process<R: Read + Seek, W: Write + Seek>( &self, reader: R, writer: W, format: ArchiveFormat, ) -> Result<ArchiveStats>
Auto-detect the archive format and process accordingly.
For zip archives the reader must additionally implement Seek.
This method accepts Read + Seek to cover all formats uniformly.
Tar and tar.gz do not require seeking, but the bound is imposed
for a single entry point.
§Errors
Returns SanitizeError::ArchiveError on I/O failures or
SanitizeError::RecursionDepthExceeded for nested archives.
Auto Trait Implementations§
impl Freeze for ArchiveProcessor
impl !RefUnwindSafe for ArchiveProcessor
impl Send for ArchiveProcessor
impl Sync for ArchiveProcessor
impl Unpin for ArchiveProcessor
impl UnsafeUnpin for ArchiveProcessor
impl !UnwindSafe for ArchiveProcessor
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more