pub struct StreamScanner { /* private fields */ }Expand description
Streaming scanner that detects and replaces sensitive patterns.
Thread-safe: can be shared via Arc<StreamScanner> for concurrent
scanning of multiple files. Each call to scan_reader
is independent and maintains its own chunking state.
§Usage
use sanitize_engine::scanner::{StreamScanner, ScanPattern, ScanConfig};
use sanitize_engine::category::Category;
use sanitize_engine::generator::HmacGenerator;
use sanitize_engine::store::MappingStore;
use std::sync::Arc;
// 1. Build the replacement store.
let gen = Arc::new(HmacGenerator::new([42u8; 32]));
let store = Arc::new(MappingStore::new(gen, None));
// 2. Define patterns.
let patterns = vec![
ScanPattern::from_regex(
r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
Category::Email,
"email",
).unwrap(),
];
// 3. Create the scanner.
let scanner = StreamScanner::new(patterns, store, ScanConfig::default()).unwrap();
// 4. Scan.
let input = b"Contact alice@corp.com for details.";
let (output, stats) = scanner.scan_bytes(input).unwrap();
assert_eq!(stats.matches_found, 1);
assert!(!output.windows(b"alice@corp.com".len())
.any(|w| w == b"alice@corp.com"));Implementations§
Source§impl StreamScanner
impl StreamScanner
Sourcepub fn new(
patterns: Vec<ScanPattern>,
store: Arc<MappingStore>,
config: ScanConfig,
) -> Result<Self>
pub fn new( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, ) -> Result<Self>
Create a new streaming scanner.
§Arguments
patterns— the set of patterns to scan for.store— the mapping store for dedup-consistent replacements.config— chunking / overlap configuration.
§Errors
Returns SanitizeError::InvalidConfig if the configuration is
invalid (e.g. chunk_size == 0 or overlap_size >= chunk_size).
Sourcepub fn new_with_max_patterns(
patterns: Vec<ScanPattern>,
store: Arc<MappingStore>,
config: ScanConfig,
max_patterns: usize,
) -> Result<Self>
pub fn new_with_max_patterns( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, max_patterns: usize, ) -> Result<Self>
Create a new streaming scanner with a custom pattern limit.
This is identical to new but allows overriding the
default pattern cap (10 000). Use this
when you have a legitimate need for more patterns and have
verified that your system has enough memory for the resulting
RegexSet.
§Errors
Returns SanitizeError::InvalidConfig if the configuration is
invalid or the pattern count exceeds max_patterns.
Sourcepub fn with_extra_literals(&self, extra: Vec<ScanPattern>) -> Result<Self>
pub fn with_extra_literals(&self, extra: Vec<ScanPattern>) -> Result<Self>
Create a copy of this scanner extended with additional literal patterns.
Clones the existing pattern set and appends extra, then rebuilds
the internal Aho-Corasick and RegexSet automata. Used by the
format-preserving structured pass to scan original bytes with
discovered field-value literals added to the base pattern set.
§Errors
Returns SanitizeError if automaton construction fails or the
combined pattern count exceeds the default limit.
Sourcepub fn scan_reader<R: Read, W: Write>(
&self,
reader: R,
writer: W,
) -> Result<ScanStats>
pub fn scan_reader<R: Read, W: Write>( &self, reader: R, writer: W, ) -> Result<ScanStats>
Scan a reader and write sanitized output to a writer.
Processes the input in chunks of config.chunk_size bytes,
maintaining an overlap window of config.overlap_size bytes to
catch matches spanning chunk boundaries. All detected matches
are replaced one-way via the MappingStore.
§Arguments
reader— input source (file, network stream,&[u8], …).writer— output sink (file,Vec<u8>, …).
§Returns
ScanStats with counters for bytes processed, matches found, etc.
§Errors
Returns SanitizeError on I/O failures or if a replacement
cannot be generated (e.g. store capacity exceeded).
Sourcepub fn scan_reader_with_progress<R: Read, W: Write, F>(
&self,
reader: R,
writer: W,
total_bytes: Option<u64>,
on_progress: F,
) -> Result<ScanStats>where
F: FnMut(&ScanProgress),
pub fn scan_reader_with_progress<R: Read, W: Write, F>(
&self,
reader: R,
writer: W,
total_bytes: Option<u64>,
on_progress: F,
) -> Result<ScanStats>where
F: FnMut(&ScanProgress),
Scan a reader and emit progress snapshots after each committed chunk.
total_bytes should be provided when the caller knows the full input
size. When omitted, progress consumers should avoid percentages/ETA.
§Errors
Returns SanitizeError on I/O failures or if a replacement
cannot be generated (e.g. store capacity exceeded).
Sourcepub fn scan_bytes(&self, input: &[u8]) -> Result<(Vec<u8>, ScanStats)>
pub fn scan_bytes(&self, input: &[u8]) -> Result<(Vec<u8>, ScanStats)>
Convenience: scan byte slice in-memory and return sanitized output.
Equivalent to scan_reader(input, Vec::new()) but returns the
output buffer directly.
§Errors
Returns SanitizeError if a replacement cannot be generated
(e.g. store capacity exceeded).
Sourcepub fn scan_bytes_with_progress<F>(
&self,
input: &[u8],
on_progress: F,
) -> Result<(Vec<u8>, ScanStats)>where
F: FnMut(&ScanProgress),
pub fn scan_bytes_with_progress<F>(
&self,
input: &[u8],
on_progress: F,
) -> Result<(Vec<u8>, ScanStats)>where
F: FnMut(&ScanProgress),
Scan a byte slice in memory and emit progress snapshots.
§Errors
Returns SanitizeError if a replacement cannot be generated
(e.g. store capacity exceeded).
Sourcepub fn config(&self) -> &ScanConfig
pub fn config(&self) -> &ScanConfig
Access the scanner’s configuration.
Sourcepub fn store(&self) -> &Arc<MappingStore> ⓘ
pub fn store(&self) -> &Arc<MappingStore> ⓘ
Access the underlying mapping store.
Sourcepub fn pattern_count(&self) -> usize
pub fn pattern_count(&self) -> usize
Number of patterns registered in this scanner.
Sourcepub fn from_encrypted_secrets(
encrypted_bytes: &[u8],
password: &str,
format: Option<SecretsFormat>,
store: Arc<MappingStore>,
config: ScanConfig,
extra_patterns: Vec<ScanPattern>,
) -> Result<(Self, Vec<(usize, SanitizeError)>)>
pub fn from_encrypted_secrets( encrypted_bytes: &[u8], password: &str, format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>
Create a scanner from an encrypted secrets file.
Decrypts the file in memory, parses the entries, compiles patterns, and returns the scanner ready to scan. Decrypted plaintext is scrubbed from memory after parsing.
§Arguments
encrypted_bytes— raw bytes of the.encfile.password— user password.format— optional format override for the plaintext.store— mapping store for dedup-consistent replacements.config— chunking / overlap configuration.extra_patterns— additional patterns to merge in.
§Returns
(scanner, warnings) where warnings lists entries that
failed to compile (index + error).
§Errors
Returns [SanitizeError::SecretsError] on decryption failure
or SanitizeError::InvalidConfig on invalid scanner config.
Sourcepub fn from_plaintext_secrets(
plaintext: &[u8],
format: Option<SecretsFormat>,
store: Arc<MappingStore>,
config: ScanConfig,
extra_patterns: Vec<ScanPattern>,
) -> Result<(Self, Vec<(usize, SanitizeError)>)>
pub fn from_plaintext_secrets( plaintext: &[u8], format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>
Create a scanner from a plaintext secrets file.
Convenience for development / testing without encryption.
§Errors
Returns [SanitizeError::SecretsError] on parse failure
or SanitizeError::InvalidConfig on invalid scanner config.
Auto Trait Implementations§
impl Freeze for StreamScanner
impl !RefUnwindSafe for StreamScanner
impl Send for StreamScanner
impl Sync for StreamScanner
impl Unpin for StreamScanner
impl UnsafeUnpin for StreamScanner
impl !UnwindSafe for StreamScanner
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more