pub struct CompiledScanner {
pub config: ScannerConfig,
pub alphabet_screen: Option<AlphabetScreen>,
/* private fields */
}Fields§
§config: ScannerConfig§alphabet_screen: Option<AlphabetScreen>Implementations§
Source§impl CompiledScanner
impl CompiledScanner
pub fn compile(detectors: Vec<DetectorSpec>) -> Result<Self>
pub fn compile_with_gpu_policy( detectors: Vec<DetectorSpec>, gpu_policy: GpuInitPolicy, ) -> Result<Self>
Sourcepub fn with_config(self, config: ScannerConfig) -> Self
pub fn with_config(self, config: ScannerConfig) -> Self
Apply a custom configuration to the compiled scanner.
Source§impl CompiledScanner
impl CompiledScanner
pub fn scan_coalesced_gpu_ac_phase1(&self, chunks: &[Chunk]) -> GpuPhase1Output
Source§impl CompiledScanner
impl CompiledScanner
Sourcepub fn gpu_matcher(&self) -> Option<&GpuLiteralSet>
pub fn gpu_matcher(&self) -> Option<&GpuLiteralSet>
Lazily compile the GPU literal-set on first call. Returns None
when no compatible adapter was detected at probe time.
Persists the compiled matcher to ~/.cache/keyhog/programs/<hash>.bin.
On a cache hit the matcher is loaded from disk and the GPU
recompile is skipped entirely - biggest cold-start win on
keyhog scan / scan-system runs that re-launch repeatedly.
Cache misses (no file, version-mismatch, corrupt blob) silently
recompile and re-cache.
Sourcepub fn ac_gpu_program(&self) -> Option<&Program>
pub fn ac_gpu_program(&self) -> Option<&Program>
Lazily build the Aho-Corasick bounded-ranges dispatch Program from the GpuLiteralSet’s CompiledDfa. The two engines share the same DFA - only the dispatch Program (and therefore the per-byte algorithm) differs:
gpu_matcher().program-build_literal_set_program: walks every pattern × every literal byte per haystack position.O(N × L) per byte. Works for any pattern set that fits the DFA budget.ac_gpu_program()-classic_ac_bounded_ranges_program: walks the AC transition table forwardL_maxbytes per position, emits every pattern in the accepting state’s flat output_links.O(L_max) per byteregardless of N.
Selected at scan time via KEYHOG_GPU_KERNEL=ac. Returns
None when no GPU matcher is available; callers fall through
to the literal-set path or non-GPU backend.
Cap of super::rule_pipeline::AC_GPU_MAX_MATCHES_PER_DISPATCH triples per shard
dispatch matches the existing literal-set output-buffer cap.
Truncation (count > cap on readback) is handled by the same
fall-back-to-CPU branch the literal-set path uses.
Sourcepub fn rule_pipeline(&self) -> Option<&RulePipeline>
pub fn rule_pipeline(&self) -> Option<&RulePipeline>
Lazily compile the regex-NFA RulePipeline on first call.
Returns None once the OnceLock has fired when the regex
compile failed - typically because the combined NFA exceeds
vyre’s per-subgroup state cap (LANES * 32) or because one
of the detector regexes uses a feature the byte-NFA frontend
can’t represent (Unicode classes, lookaround, backrefs).
Callers should fall back to the literal-set GPU dispatch on
None.
Pipeline is sized for super::rule_pipeline::megascan_input_len() bytes; batches
larger than that must take a different path. The orchestrator
caps batches at the same value (256 MiB default, up to 1 GiB
on 24+ GiB-VRAM cards) so this matches normal scan flow.
Sourcepub fn fused_decode_programs(&self) -> Option<&FusedDecodeScanPrograms>
pub fn fused_decode_programs(&self) -> Option<&FusedDecodeScanPrograms>
Lazily build fused GPU decode→scan programs (base64 + hex).
Returns None when no GPU matcher is available (no literals, no
adapter). The fused programs share the same DFA transition tables
as the literal-set engine but prepend an on-GPU decode stage,
eliminating the CPU→GPU round-trip for encoded content.
Source§impl CompiledScanner
impl CompiledScanner
pub fn scan_coalesced_gpu_phase1(&self, chunks: &[Chunk]) -> GpuPhase1Output
Source§impl CompiledScanner
impl CompiledScanner
Sourcepub fn fused_program(&self) -> Option<&Program>
pub fn fused_program(&self) -> Option<&Program>
Lazily build a fused Program that merges the AC literal-set
program with the rule pipeline program (when available) into a
single GPU dispatch.
Returns None when:
- No AC GPU program is available (no GPU adapter, no literals).
- Fusion fails due to incompatible buffer layouts, over-dispatch geometry, or self-aliasing constraints.
- Only one program is available (fusion is identity; we skip the overhead of the fused wrapper and dispatch the original directly).
The fused program is cached on disk alongside individual programs so cold starts after the first successful fusion are free.
Source§impl CompiledScanner
impl CompiledScanner
Sourcepub fn scan_coalesced(&self, chunks: &[Chunk]) -> Vec<Vec<RawMatch>>
pub fn scan_coalesced(&self, chunks: &[Chunk]) -> Vec<Vec<RawMatch>>
High-throughput coalesced scan: all files scanned in parallel, zero overhead for non-hit files.
Architecture: Phase 1: Parallel HS prefilter on raw bytes (no prep, no alloc) Phase 2: Full extraction only on hit files (~5% of total)
Source§impl CompiledScanner
impl CompiledScanner
Sourcepub fn detector_count(&self) -> usize
pub fn detector_count(&self) -> usize
Number of loaded detectors.
Sourcepub fn pattern_count(&self) -> usize
pub fn pattern_count(&self) -> usize
Total number of patterns (AC + fallback).
Sourcepub fn warm(&self)
pub fn warm(&self)
Eagerly compile every pattern’s regex, in parallel, up front.
Patterns compile lazily on first use (see crate::types::LazyRegex),
which makes a one-shot CLI scan start in milliseconds instead of
paying ~450ms-2.3s to build the whole corpus. For a LONG-lived or
LARGE scan - the daemon, watch, scan-system, or a big repo where a
detector fires across thousands of files - it’s better to pay the
compile once, in parallel, before the hot loop rather than stalling
the first file that touches each detector. Callers on those paths
should warm() after building the scanner.
Idempotent and cheap to repeat: an already-compiled pattern is a
OnceLock hit. Also the correct setup for a per-scan perf benchmark,
which means to measure match throughput, not one-time compilation.
Sourcepub fn pattern_regex_strs(&self) -> Vec<&str>
pub fn pattern_regex_strs(&self) -> Vec<&str>
Iterator over the FINAL regex source strings (post anchoring / group extraction / normalization) the scanner uses.
Sourcepub fn select_backend_for_file(&self, file_size: u64) -> ScanBackend
pub fn select_backend_for_file(&self, file_size: u64) -> ScanBackend
Return the preferred backend for a file of the given size.
Sourcepub fn gpu_backend_label(&self) -> Option<&'static str>
pub fn gpu_backend_label(&self) -> Option<&'static str>
Identifier of the GPU backend acquired at compile time, or
None if scanning routes to CPU/SIMD only. Mirrors
VyreBackend::id() which returns “cuda”, “wgpu”, or the
driver-defined name. The startup banner uses this so the
operator can tell at a glance whether they got CUDA (the
headline 5-10x faster path on NVIDIA hardware) or the WGPU
fallback, rather than just “Gpu” which collapses both.
Sourcepub fn last_gpu_degrade_reason(&self) -> Option<String>
pub fn last_gpu_degrade_reason(&self) -> Option<String>
Most recent concrete GPU runtime-degrade reason for this compiled scanner, if one has occurred. Used by health probes to emit machine-readable failure causes without scraping stderr.
Sourcepub fn preferred_backend_label(&self) -> &'static str
pub fn preferred_backend_label(&self) -> &'static str
Return the steady-state backend label used for startup reporting.
Sourcepub fn warm_backend(&self, backend: ScanBackend) -> bool
pub fn warm_backend(&self, backend: ScanBackend) -> bool
Warm backend resources that are initialized lazily during scanning.
Sourcepub fn scan(&self, chunk: &Chunk) -> Vec<RawMatch>
pub fn scan(&self, chunk: &Chunk) -> Vec<RawMatch>
Scan a chunk of text and return all raw credential matches.
Sourcepub fn scan_with_backend(
&self,
chunk: &Chunk,
backend: ScanBackend,
) -> Vec<RawMatch>
pub fn scan_with_backend( &self, chunk: &Chunk, backend: ScanBackend, ) -> Vec<RawMatch>
Scan a chunk using a caller-selected backend.
Sourcepub fn scan_chunks_with_backend(
&self,
chunks: &[Chunk],
backend: ScanBackend,
) -> Vec<Vec<RawMatch>>
pub fn scan_chunks_with_backend( &self, chunks: &[Chunk], backend: ScanBackend, ) -> Vec<Vec<RawMatch>>
Scan multiple chunks using a caller-selected backend.
Sourcepub fn clear_fragment_cache(&self)
pub fn clear_fragment_cache(&self)
Reset the cross-file fragment-reassembly cache.
Sourcepub fn scan_with_deadline(
&self,
chunk: &Chunk,
deadline: Option<Instant>,
) -> Vec<RawMatch>
pub fn scan_with_deadline( &self, chunk: &Chunk, deadline: Option<Instant>, ) -> Vec<RawMatch>
Scan a chunk of text against all compiled detectors.
pub fn scan_with_deadline_and_backend( &self, chunk: &Chunk, deadline: Option<Instant>, backend: Option<ScanBackend>, ) -> Vec<RawMatch>
Auto Trait Implementations§
impl !Freeze for CompiledScanner
impl !RefUnwindSafe for CompiledScanner
impl !UnwindSafe for CompiledScanner
impl Send for CompiledScanner
impl Sync for CompiledScanner
impl Unpin for CompiledScanner
impl UnsafeUnpin for CompiledScanner
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more