Skip to main content

StreamScanner

Struct StreamScanner 

Source
pub struct StreamScanner { /* private fields */ }
Expand description

Streaming scanner that detects and replaces sensitive patterns.

Thread-safe: can be shared via Arc<StreamScanner> for concurrent scanning of multiple files. Each call to scan_reader is independent and maintains its own chunking state.

§Usage

use sanitize_engine::scanner::{StreamScanner, ScanPattern, ScanConfig};
use sanitize_engine::category::Category;
use sanitize_engine::generator::HmacGenerator;
use sanitize_engine::store::MappingStore;
use std::sync::Arc;

// 1. Build the replacement store.
let gen = Arc::new(HmacGenerator::new([42u8; 32]));
let store = Arc::new(MappingStore::new(gen, None));

// 2. Define patterns.
let patterns = vec![
    ScanPattern::from_regex(
        r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
        Category::Email,
        "email",
    ).unwrap(),
];

// 3. Create the scanner.
let scanner = StreamScanner::new(patterns, store, ScanConfig::default()).unwrap();

// 4. Scan.
let input = b"Contact alice@corp.com for details.";
let (output, stats) = scanner.scan_bytes(input).unwrap();
assert_eq!(stats.matches_found, 1);
assert!(!output.windows(b"alice@corp.com".len())
    .any(|w| w == b"alice@corp.com"));

Implementations§

Source§

impl StreamScanner

Source

pub fn new( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, ) -> Result<Self>

Create a new streaming scanner.

§Arguments
  • patterns — the set of patterns to scan for.
  • store — the mapping store for dedup-consistent replacements.
  • config — chunking / overlap configuration.
§Errors

Returns SanitizeError::InvalidConfig if the configuration is invalid (e.g. chunk_size == 0 or overlap_size >= chunk_size).

Source

pub fn new_with_max_patterns( patterns: Vec<ScanPattern>, store: Arc<MappingStore>, config: ScanConfig, max_patterns: usize, ) -> Result<Self>

Create a new streaming scanner with a custom pattern limit.

This is identical to new but allows overriding the default pattern cap (10 000). Use this when you have a legitimate need for more patterns and have verified that your system has enough memory for the resulting RegexSet.

§Errors

Returns SanitizeError::InvalidConfig if the configuration is invalid or the pattern count exceeds max_patterns.

Source

pub fn with_extra_literals(&self, extra: Vec<ScanPattern>) -> Result<Self>

Create a copy of this scanner extended with additional literal patterns.

Clones the existing pattern set and appends extra, then rebuilds the internal Aho-Corasick and RegexSet automata. Used by the format-preserving structured pass to scan original bytes with discovered field-value literals added to the base pattern set.

§Errors

Returns SanitizeError if automaton construction fails or the combined pattern count exceeds the default limit.

Source

pub fn scan_reader<R: Read, W: Write>( &self, reader: R, writer: W, ) -> Result<ScanStats>

Scan a reader and write sanitized output to a writer.

Processes the input in chunks of config.chunk_size bytes, maintaining an overlap window of config.overlap_size bytes to catch matches spanning chunk boundaries. All detected matches are replaced one-way via the MappingStore.

§Arguments
  • reader — input source (file, network stream, &[u8], …).
  • writer — output sink (file, Vec<u8>, …).
§Returns

ScanStats with counters for bytes processed, matches found, etc.

§Errors

Returns SanitizeError on I/O failures or if a replacement cannot be generated (e.g. store capacity exceeded).

Source

pub fn scan_reader_with_progress<R: Read, W: Write, F>( &self, reader: R, writer: W, total_bytes: Option<u64>, on_progress: F, ) -> Result<ScanStats>
where F: FnMut(&ScanProgress),

Scan a reader and emit progress snapshots after each committed chunk.

total_bytes should be provided when the caller knows the full input size. When omitted, progress consumers should avoid percentages/ETA.

§Errors

Returns SanitizeError on I/O failures or if a replacement cannot be generated (e.g. store capacity exceeded).

Source

pub fn scan_bytes(&self, input: &[u8]) -> Result<(Vec<u8>, ScanStats)>

Convenience: scan byte slice in-memory and return sanitized output.

Equivalent to scan_reader(input, Vec::new()) but returns the output buffer directly.

§Errors

Returns SanitizeError if a replacement cannot be generated (e.g. store capacity exceeded).

Source

pub fn scan_bytes_with_progress<F>( &self, input: &[u8], on_progress: F, ) -> Result<(Vec<u8>, ScanStats)>
where F: FnMut(&ScanProgress),

Scan a byte slice in memory and emit progress snapshots.

§Errors

Returns SanitizeError if a replacement cannot be generated (e.g. store capacity exceeded).

Source

pub fn config(&self) -> &ScanConfig

Access the scanner’s configuration.

Source

pub fn store(&self) -> &Arc<MappingStore>

Access the underlying mapping store.

Source

pub fn pattern_count(&self) -> usize

Number of patterns registered in this scanner.

Source

pub fn from_encrypted_secrets( encrypted_bytes: &[u8], password: &str, format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>

Create a scanner from an encrypted secrets file.

Decrypts the file in memory, parses the entries, compiles patterns, and returns the scanner ready to scan. Decrypted plaintext is scrubbed from memory after parsing.

§Arguments
  • encrypted_bytes — raw bytes of the .enc file.
  • password — user password.
  • format — optional format override for the plaintext.
  • store — mapping store for dedup-consistent replacements.
  • config — chunking / overlap configuration.
  • extra_patterns — additional patterns to merge in.
§Returns

(scanner, warnings) where warnings lists entries that failed to compile (index + error).

§Errors

Returns [SanitizeError::SecretsError] on decryption failure or SanitizeError::InvalidConfig on invalid scanner config.

Source

pub fn from_plaintext_secrets( plaintext: &[u8], format: Option<SecretsFormat>, store: Arc<MappingStore>, config: ScanConfig, extra_patterns: Vec<ScanPattern>, ) -> Result<(Self, Vec<(usize, SanitizeError)>)>

Create a scanner from a plaintext secrets file.

Convenience for development / testing without encryption.

§Errors

Returns [SanitizeError::SecretsError] on parse failure or SanitizeError::InvalidConfig on invalid scanner config.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V