pub struct ScanState {
pub matches: BinaryHeap<Reverse<RawMatch>>,
pub credential_interner: HashSet<Arc<str>>,
pub metadata_interner: HashSet<Arc<str>>,
pub static_intern: Option<Arc<StaticInterner>>,
pub ml_score_cache: HashMap<(String, String), f64>,
pub ml_cache_order: VecDeque<(String, String)>,
pub ml_cache_bytes: usize,
pub ml_pending: Vec<MlPendingMatch>,
}Expand description
Internal state for a single scan operation (tracks matches and ML cache).
Fields§
§matches: BinaryHeap<Reverse<RawMatch>>Matches collected for this chunk, prioritized by confidence. Uses Reverse to make it a min-heap so we can easily pop the LOWEST confidence.
credential_interner: HashSet<Arc<str>>Interner for credentials found in this chunk to save memory on duplicates.
metadata_interner: HashSet<Arc<str>>Static string cache for detector metadata. Uses
HashSet<Arc<str>> (not HashMap<String, Arc<str>>) so a
cache miss allocates ONLY the Arc<str> - the prior shape
also allocated a String to serve as the HashMap key, paying
twice for what’s a single dedup slot. HashSet::get(&s) works
via Arc<str>: Borrow<str>, no allocation on hits.
Hit ONLY by dynamic strings now: the scanner-wide
StaticInterner (vyre CHD perfect hash) handles every
(detector_id, detector_name, service, source_type) lookup
without per-scan allocation.
static_intern: Option<Arc<StaticInterner>>Optional reference to the scanner’s frozen static-string
interner. When Some, intern_metadata checks here first
before falling through to the per-scan metadata_interner.
Lock-free on read so concurrent rayon workers share one
instance without contention.
ml_score_cache: HashMap<(String, String), f64>§ml_cache_order: VecDeque<(String, String)>§ml_cache_bytes: usize§ml_pending: Vec<MlPendingMatch>Detector matches queued for batch ML scoring at the end of the scan.
Implementations§
Source§impl ScanState
impl ScanState
Sourcepub fn intern_credential(&mut self, s: &str) -> Arc<str>
pub fn intern_credential(&mut self, s: &str) -> Arc<str>
Intern a credential string, returning an Arc<str>.
Sourcepub fn intern_metadata(&mut self, s: &str) -> Arc<str>
pub fn intern_metadata(&mut self, s: &str) -> Arc<str>
Intern a metadata string (detector_id, name, service, source_type, …).
Lookup order:
- Scanner-wide
StaticInterner(vyre CHD perfect hash) for detector metadata that’s frozen at scanner construction - O(1), no allocation, no lock contention. - Per-scan
metadata_internerHashSetfor dynamic strings (file paths, commit SHAs, author names, dates).
Sourcepub fn with_static_intern(intern: Arc<StaticInterner>) -> Self
pub fn with_static_intern(intern: Arc<StaticInterner>) -> Self
Construct a ScanState that consults the scanner-wide static
interner first. Use this from any path that has a
&CompiledScanner in scope; falls back to default() for
stand-alone unit tests.
Sourcepub fn push_match(&mut self, m: RawMatch, limit: usize)
pub fn push_match(&mut self, m: RawMatch, limit: usize)
Push a match to the state, maintaining priority and capacity. High-confidence secrets will displace lower-confidence findings.
Sourcepub fn into_matches(self) -> Vec<RawMatch>
pub fn into_matches(self) -> Vec<RawMatch>
Drain all matches into a sorted vector. Dedups identical findings (same detector + same credential + same offset) - two engines can produce the same finding for the same pattern (e.g. ac_map’s literal hit + homoglyph fallback variant both fire on plain ASCII because the homoglyph char-class includes the original char). The caller only wants one of them in the result set.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for ScanState
impl RefUnwindSafe for ScanState
impl Send for ScanState
impl Sync for ScanState
impl Unpin for ScanState
impl UnsafeUnpin for ScanState
impl UnwindSafe for ScanState
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more