pub struct StreamScanner { /* private fields */ }Expand description
Streaming scanner that detects and replaces sensitive patterns.
Thread-safe: can be shared via Arc<StreamScanner> for concurrent
scanning of multiple files. Each call to scan_reader
is independent and maintains its own chunking state.
§Usage
use sanitize_engine::scanner::{StreamScanner, ScanPattern, ScanConfig};
use sanitize_engine::category::Category;
use sanitize_engine::generator::HmacGenerator;
use sanitize_engine::store::MappingStore;
use std::sync::Arc;
// 1. Build the replacement store.
let gen = Arc::new(HmacGenerator::new([42u8; 32]));
let store = Arc::new(MappingStore::new(gen, None));
// 2. Define patterns.
let patterns = vec![
ScanPattern::from_regex(
r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
Category::Email,
"email",
).unwrap(),
];
// 3. Create the scanner.
let scanner = StreamScanner::new(patterns, store, ScanConfig::default()).unwrap();
// 4. Scan.
let input = b"Contact alice@corp.com for details.";
let (output, stats) = scanner.scan_bytes(input).unwrap();
assert_eq!(stats.matches_found, 1);
assert!(!output.windows(b"alice@corp.com".len())
.any(|w| w == b"alice@corp.com"));Implementations§
Source§impl StreamScanner
impl StreamScanner
Sourcepub fn new(
patterns: Vec<ScanPattern>,
store: Arc<MappingStore>,
config: ScanConfig,
) -> Result<Self>
pub fn new( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, ) -> Result<Self>
Create a new streaming scanner.
§Arguments
patterns— the set of patterns to scan for.store— the mapping store for dedup-consistent replacements.config— chunking / overlap configuration.
§Errors
Returns SanitizeError::InvalidConfig if the configuration is
invalid (e.g. chunk_size == 0 or overlap_size >= chunk_size).
Sourcepub fn new_with_max_patterns(
patterns: Vec<ScanPattern>,
store: Arc<MappingStore>,
config: ScanConfig,
max_patterns: usize,
) -> Result<Self>
pub fn new_with_max_patterns( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, max_patterns: usize, ) -> Result<Self>
Create a new streaming scanner with a custom pattern limit.
This is identical to new but allows overriding the
default pattern cap (10 000). Use this
when you have a legitimate need for more patterns and have
verified that your system has enough memory for the resulting
RegexSet.
§Errors
Returns SanitizeError::InvalidConfig if the configuration is
invalid or the pattern count exceeds max_patterns.
Sourcepub fn scan_reader<R: Read, W: Write>(
&self,
reader: R,
writer: W,
) -> Result<ScanStats>
pub fn scan_reader<R: Read, W: Write>( &self, reader: R, writer: W, ) -> Result<ScanStats>
Scan a reader and write sanitized output to a writer.
Processes the input in chunks of config.chunk_size bytes,
maintaining an overlap window of config.overlap_size bytes to
catch matches spanning chunk boundaries. All detected matches
are replaced one-way via the MappingStore.
§Arguments
reader— input source (file, network stream,&[u8], …).writer— output sink (file,Vec<u8>, …).
§Returns
ScanStats with counters for bytes processed, matches found, etc.
§Errors
Returns SanitizeError on I/O failures or if a replacement
cannot be generated (e.g. store capacity exceeded).
Sourcepub fn scan_bytes(&self, input: &[u8]) -> Result<(Vec<u8>, ScanStats)>
pub fn scan_bytes(&self, input: &[u8]) -> Result<(Vec<u8>, ScanStats)>
Convenience: scan byte slice in-memory and return sanitized output.
Equivalent to scan_reader(input, Vec::new()) but returns the
output buffer directly.
§Errors
Returns SanitizeError if a replacement cannot be generated
(e.g. store capacity exceeded).
Sourcepub fn config(&self) -> &ScanConfig
pub fn config(&self) -> &ScanConfig
Access the scanner’s configuration.
Sourcepub fn store(&self) -> &Arc<MappingStore>
pub fn store(&self) -> &Arc<MappingStore>
Access the underlying mapping store.
Sourcepub fn pattern_count(&self) -> usize
pub fn pattern_count(&self) -> usize
Number of patterns registered in this scanner.
Sourcepub fn from_encrypted_secrets(
encrypted_bytes: &[u8],
password: &str,
format: Option<SecretsFormat>,
store: Arc<MappingStore>,
config: ScanConfig,
extra_patterns: Vec<ScanPattern>,
) -> Result<(Self, Vec<(usize, SanitizeError)>)>
pub fn from_encrypted_secrets( encrypted_bytes: &[u8], password: &str, format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>
Create a scanner from an encrypted secrets file.
Decrypts the file in memory, parses the entries, compiles patterns, and returns the scanner ready to scan. Decrypted plaintext is scrubbed from memory after parsing.
§Arguments
encrypted_bytes— raw bytes of the.encfile.password— user password.format— optional format override for the plaintext.store— mapping store for dedup-consistent replacements.config— chunking / overlap configuration.extra_patterns— additional patterns to merge in.
§Returns
(scanner, warnings) where warnings lists entries that
failed to compile (index + error).
§Errors
Returns SanitizeError::SecretsError on decryption failure
or SanitizeError::InvalidConfig on invalid scanner config.
Sourcepub fn from_plaintext_secrets(
plaintext: &[u8],
format: Option<SecretsFormat>,
store: Arc<MappingStore>,
config: ScanConfig,
extra_patterns: Vec<ScanPattern>,
) -> Result<(Self, Vec<(usize, SanitizeError)>)>
pub fn from_plaintext_secrets( plaintext: &[u8], format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>
Create a scanner from a plaintext secrets file.
Convenience for development / testing without encryption.
§Errors
Returns SanitizeError::SecretsError on parse failure
or SanitizeError::InvalidConfig on invalid scanner config.