Skip to main content

StreamScanner

Struct StreamScanner 

Source
pub struct StreamScanner { /* private fields */ }
Expand description

Streaming scanner that detects and replaces sensitive patterns.

Thread-safe: can be shared via Arc<StreamScanner> for concurrent scanning of multiple files. Each call to scan_reader is independent and maintains its own chunking state.

§Usage

use sanitize_engine::scanner::{StreamScanner, ScanPattern, ScanConfig};
use sanitize_engine::category::Category;
use sanitize_engine::generator::HmacGenerator;
use sanitize_engine::store::MappingStore;
use std::sync::Arc;

// 1. Build the replacement store.
let gen = Arc::new(HmacGenerator::new([42u8; 32]));
let store = Arc::new(MappingStore::new(gen, None));

// 2. Define patterns.
let patterns = vec![
    ScanPattern::from_regex(
        r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
        Category::Email,
        "email",
    ).unwrap(),
];

// 3. Create the scanner.
let scanner = StreamScanner::new(patterns, store, ScanConfig::default()).unwrap();

// 4. Scan.
let input = b"Contact alice@corp.com for details.";
let (output, stats) = scanner.scan_bytes(input).unwrap();
assert_eq!(stats.matches_found, 1);
assert!(!output.windows(b"alice@corp.com".len())
    .any(|w| w == b"alice@corp.com"));

Implementations§

Source§

impl StreamScanner

Source

pub fn new( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, ) -> Result<Self>

Create a new streaming scanner.

§Arguments
  • patterns — the set of patterns to scan for.
  • store — the mapping store for dedup-consistent replacements.
  • config — chunking / overlap configuration.
§Errors

Returns SanitizeError::InvalidConfig if the configuration is invalid (e.g. chunk_size == 0 or overlap_size >= chunk_size).

Source

pub fn new_with_max_patterns( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, max_patterns: usize, ) -> Result<Self>

Create a new streaming scanner with a custom pattern limit.

This is identical to new but allows overriding the default pattern cap (10 000). Use this when you have a legitimate need for more patterns and have verified that your system has enough memory for the resulting RegexSet.

§Errors

Returns SanitizeError::InvalidConfig if the configuration is invalid or the pattern count exceeds max_patterns.

Source

pub fn scan_reader<R: Read, W: Write>( &self, reader: R, writer: W, ) -> Result<ScanStats>

Scan a reader and write sanitized output to a writer.

Processes the input in chunks of config.chunk_size bytes, maintaining an overlap window of config.overlap_size bytes to catch matches spanning chunk boundaries. All detected matches are replaced one-way via the MappingStore.

§Arguments
  • reader — input source (file, network stream, &[u8], …).
  • writer — output sink (file, Vec<u8>, …).
§Returns

ScanStats with counters for bytes processed, matches found, etc.

§Errors

Returns SanitizeError on I/O failures or if a replacement cannot be generated (e.g. store capacity exceeded).

Source

pub fn scan_bytes(&self, input: &[u8]) -> Result<(Vec<u8>, ScanStats)>

Convenience: scan byte slice in-memory and return sanitized output.

Equivalent to scan_reader(input, Vec::new()) but returns the output buffer directly.

§Errors

Returns SanitizeError if a replacement cannot be generated (e.g. store capacity exceeded).

Source

pub fn config(&self) -> &ScanConfig

Access the scanner’s configuration.

Source

pub fn store(&self) -> &Arc<MappingStore>

Access the underlying mapping store.

Source

pub fn pattern_count(&self) -> usize

Number of patterns registered in this scanner.

Source

pub fn from_encrypted_secrets( encrypted_bytes: &[u8], password: &str, format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>

Create a scanner from an encrypted secrets file.

Decrypts the file in memory, parses the entries, compiles patterns, and returns the scanner ready to scan. Decrypted plaintext is scrubbed from memory after parsing.

§Arguments
  • encrypted_bytes — raw bytes of the .enc file.
  • password — user password.
  • format — optional format override for the plaintext.
  • store — mapping store for dedup-consistent replacements.
  • config — chunking / overlap configuration.
  • extra_patterns — additional patterns to merge in.
§Returns

(scanner, warnings) where warnings lists entries that failed to compile (index + error).

§Errors

Returns SanitizeError::SecretsError on decryption failure or SanitizeError::InvalidConfig on invalid scanner config.

Source

pub fn from_plaintext_secrets( plaintext: &[u8], format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>

Create a scanner from a plaintext secrets file.

Convenience for development / testing without encryption.

§Errors

Returns SanitizeError::SecretsError on parse failure or SanitizeError::InvalidConfig on invalid scanner config.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V