Skip to main content

CompiledScanner

Struct CompiledScanner 

Source
pub struct CompiledScanner {
    pub config: ScannerConfig,
    pub alphabet_screen: Option<AlphabetScreen>,
    /* private fields */
}

Fields§

§config: ScannerConfig§alphabet_screen: Option<AlphabetScreen>

Implementations§

Source§

impl CompiledScanner

Source

pub fn compile(detectors: Vec<DetectorSpec>) -> Result<Self>

Source

pub fn compile_with_gpu_policy( detectors: Vec<DetectorSpec>, gpu_policy: GpuInitPolicy, ) -> Result<Self>

Source

pub fn with_config(self, config: ScannerConfig) -> Self

Apply a custom configuration to the compiled scanner.

Source§

impl CompiledScanner

Source§

impl CompiledScanner

Source

pub fn gpu_matcher(&self) -> Option<&GpuLiteralSet>

Lazily compile the GPU literal-set on first call. Returns None when no compatible adapter was detected at probe time.

Persists the compiled matcher to ~/.cache/keyhog/programs/<hash>.bin. On a cache hit the matcher is loaded from disk and the GPU recompile is skipped entirely - biggest cold-start win on keyhog scan / scan-system runs that re-launch repeatedly. Cache misses (no file, version-mismatch, corrupt blob) silently recompile and re-cache.

Source

pub fn ac_gpu_program(&self) -> Option<&Program>

Lazily build the Aho-Corasick bounded-ranges dispatch Program from the GpuLiteralSet’s CompiledDfa. The two engines share the same DFA - only the dispatch Program (and therefore the per-byte algorithm) differs:

  • gpu_matcher().program - build_literal_set_program: walks every pattern × every literal byte per haystack position. O(N × L) per byte. Works for any pattern set that fits the DFA budget.
  • ac_gpu_program() - classic_ac_bounded_ranges_program: walks the AC transition table forward L_max bytes per position, emits every pattern in the accepting state’s flat output_links. O(L_max) per byte regardless of N.

Selected at scan time via KEYHOG_GPU_KERNEL=ac. Returns None when no GPU matcher is available; callers fall through to the literal-set path or non-GPU backend.

Cap of super::rule_pipeline::AC_GPU_MAX_MATCHES_PER_DISPATCH triples per shard dispatch matches the existing literal-set output-buffer cap. Truncation (count > cap on readback) is handled by the same fall-back-to-CPU branch the literal-set path uses.

Source

pub fn rule_pipeline(&self) -> Option<&RulePipeline>

Lazily compile the regex-NFA RulePipeline on first call. Returns None once the OnceLock has fired when the regex compile failed - typically because the combined NFA exceeds vyre’s per-subgroup state cap (LANES * 32) or because one of the detector regexes uses a feature the byte-NFA frontend can’t represent (Unicode classes, lookaround, backrefs). Callers should fall back to the literal-set GPU dispatch on None.

Pipeline is sized for super::rule_pipeline::megascan_input_len() bytes; batches larger than that must take a different path. The orchestrator caps batches at the same value (256 MiB default, up to 1 GiB on 24+ GiB-VRAM cards) so this matches normal scan flow.

Source

pub fn fused_decode_programs(&self) -> Option<&FusedDecodeScanPrograms>

Lazily build fused GPU decode→scan programs (base64 + hex).

Returns None when no GPU matcher is available (no literals, no adapter). The fused programs share the same DFA transition tables as the literal-set engine but prepend an on-GPU decode stage, eliminating the CPU→GPU round-trip for encoded content.

Source§

impl CompiledScanner

Source§

impl CompiledScanner

Source

pub fn scan_coalesced_megascan(&self, chunks: &[Chunk]) -> Vec<Vec<RawMatch>>

Source§

impl CompiledScanner

Source

pub fn scan_coalesced_gpu_phase2( &self, chunks: &[Chunk], per_chunk_hits: Vec<Vec<(u32, u32, u32)>>, ) -> Vec<Vec<RawMatch>>

Source§

impl CompiledScanner

Source

pub fn fused_program(&self) -> Option<&Program>

Lazily build a fused Program that merges the AC literal-set program with the rule pipeline program (when available) into a single GPU dispatch.

Returns None when:

  • No AC GPU program is available (no GPU adapter, no literals).
  • Fusion fails due to incompatible buffer layouts, over-dispatch geometry, or self-aliasing constraints.
  • Only one program is available (fusion is identity; we skip the overhead of the fused wrapper and dispatch the original directly).

The fused program is cached on disk alongside individual programs so cold starts after the first successful fusion are free.

Source§

impl CompiledScanner

Source

pub fn scan_coalesced_gpu(&self, chunks: &[Chunk]) -> Vec<Vec<RawMatch>>

Source

pub fn scan_coalesced_gpu_ac(&self, chunks: &[Chunk]) -> Vec<Vec<RawMatch>>

Source§

impl CompiledScanner

Source

pub fn scan_coalesced(&self, chunks: &[Chunk]) -> Vec<Vec<RawMatch>>

High-throughput coalesced scan: all files scanned in parallel, zero overhead for non-hit files.

Architecture: Phase 1: Parallel HS prefilter on raw bytes (no prep, no alloc) Phase 2: Full extraction only on hit files (~5% of total)

Source§

impl CompiledScanner

Source

pub fn detector_count(&self) -> usize

Number of loaded detectors.

Source

pub fn pattern_count(&self) -> usize

Total number of patterns (AC + fallback).

Source

pub fn warm(&self)

Eagerly compile every pattern’s regex, in parallel, up front.

Patterns compile lazily on first use (see crate::types::LazyRegex), which makes a one-shot CLI scan start in milliseconds instead of paying ~450ms-2.3s to build the whole corpus. For a LONG-lived or LARGE scan - the daemon, watch, scan-system, or a big repo where a detector fires across thousands of files - it’s better to pay the compile once, in parallel, before the hot loop rather than stalling the first file that touches each detector. Callers on those paths should warm() after building the scanner.

Idempotent and cheap to repeat: an already-compiled pattern is a OnceLock hit. Also the correct setup for a per-scan perf benchmark, which means to measure match throughput, not one-time compilation.

Source

pub fn pattern_regex_strs(&self) -> Vec<&str>

Iterator over the FINAL regex source strings (post anchoring / group extraction / normalization) the scanner uses.

Source

pub fn select_backend_for_file(&self, file_size: u64) -> ScanBackend

Return the preferred backend for a file of the given size.

Source

pub fn gpu_backend_label(&self) -> Option<&'static str>

Identifier of the GPU backend acquired at compile time, or None if scanning routes to CPU/SIMD only. Mirrors VyreBackend::id() which returns “cuda”, “wgpu”, or the driver-defined name. The startup banner uses this so the operator can tell at a glance whether they got CUDA (the headline 5-10x faster path on NVIDIA hardware) or the WGPU fallback, rather than just “Gpu” which collapses both.

Source

pub fn last_gpu_degrade_reason(&self) -> Option<String>

Most recent concrete GPU runtime-degrade reason for this compiled scanner, if one has occurred. Used by health probes to emit machine-readable failure causes without scraping stderr.

Source

pub fn preferred_backend_label(&self) -> &'static str

Return the steady-state backend label used for startup reporting.

Source

pub fn warm_backend(&self, backend: ScanBackend) -> bool

Warm backend resources that are initialized lazily during scanning.

Source

pub fn scan(&self, chunk: &Chunk) -> Vec<RawMatch>

Scan a chunk of text and return all raw credential matches.

Source

pub fn scan_with_backend( &self, chunk: &Chunk, backend: ScanBackend, ) -> Vec<RawMatch>

Scan a chunk using a caller-selected backend.

Source

pub fn scan_chunks_with_backend( &self, chunks: &[Chunk], backend: ScanBackend, ) -> Vec<Vec<RawMatch>>

Scan multiple chunks using a caller-selected backend.

Source

pub fn clear_fragment_cache(&self)

Reset the cross-file fragment-reassembly cache.

Source

pub fn scan_with_deadline( &self, chunk: &Chunk, deadline: Option<Instant>, ) -> Vec<RawMatch>

Scan a chunk of text against all compiled detectors.

Source

pub fn scan_with_deadline_and_backend( &self, chunk: &Chunk, deadline: Option<Instant>, backend: Option<ScanBackend>, ) -> Vec<RawMatch>

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> Downcast<T> for T

Source§

fn downcast(&self) -> &T

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> Upcast<T> for T

Source§

fn upcast(&self) -> Option<&T>

Source§

impl<T> WasmNotSend for T
where T: Send,

Source§

impl<T> WasmNotSendSync for T

Source§

impl<T> WasmNotSync for T
where T: Sync,

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more