Expand description
KeyHog Scanner: A high-performance, multi-layered secret detection engine.
This crate implements the core scanning logic, combining SIMD pre-filtering, Aho-Corasick literal matching, regex fallback, and ML-based confidence scoring.
Re-exports§
pub use multiline::fragment_cache;pub use engine::GpuPhase1Output;pub use engine::CompiledScanner;pub use engine::GpuInitPolicy;pub use error::Result;pub use error::ScanError;pub use hw_probe::probe_hardware;pub use hw_probe::select_backend;pub use hw_probe::HardwareCaps;pub use hw_probe::ScanBackend;pub use types::ScannerConfig;
Modules§
- alphabet_
filter - SIMD-accelerated alphabet pre-filtering. Alphabet-based bitmask pre-filtering for ultra-fast chunk skipping.
- aws
- Offline AWS account-ID recovery from an access-key ID (no network/verify). Offline AWS account-ID recovery + canary-token classification.
- bigram_
bloom - Bigram bloom filter for fast chunk gating. Bigram-bloom prefilter - Layer 0.5 between alphabet screening and AC/HS.
- checksum
- Service-specific credential checksum validation (GitHub, npm, Slack, etc.). Checksum-aware credential validation.
- compiler
- Detector compilation into high-performance matching structures. Logic for compiling detector specifications into an efficient scanning engine.
- confidence
- Heuristic and ML-based confidence scoring for candidate matches. Confidence scoring: combines multiple signals into a 0.0–1.0 score. Higher confidence means more likely to be a real secret.
- context
- Code context analysis (comments, assignments, test files). Structural context analysis: understand WHERE in code a potential secret appears.
- decode
- Decode-through pipeline for nested encodings (base64, hex, URL, etc.). Decode-through scanning: decode base64 and hex strings before pattern matching.
- decode_
structure - Decode-structure analysis: classify what a candidate base64/hex-decodes to (binary asset magic bytes, protobuf wire) so decode-through feeds scoring. Decode-structure analysis: keyhog’s decode-through advantage, fed into scoring.
- engine
- Core scan execution engine. Core scanning engine implementation.
- entropy
- Shannon entropy analysis for secret detection. Shannon entropy analysis for distinguishing secrets from ordinary text.
- entropy_
fast - Fast scalar entropy calculation. Fast vectorized entropy calculation with architecture-specific implementations.
- error
- Specialized error types for the scanner. Specialized error types for the scanner engine.
- gpu
- GPU-accelerated matching via wgpu. GPU-accelerated batch inference for the MoE classifier via wgpu compute shaders.
- hw_
probe - Hardware capability detection and backend selection. Hardware capability probing with once-cached results.
- jwt
- JWT structural validation and anomaly detection. JWT structural validation.
- ml_
scorer - Machine learning inference for secret scoring. ML-based secret scoring with a tiny mixture-of-experts network.
- multiline
- Multiline secret reassembly logic. Multi-line string concatenation preprocessor.
- pipeline
- Internal scan pipeline orchestration. Scan pipeline: context windows, scan-loop helpers, and post-match processing.
- prefix_
trie - Prefix trie for efficient keyword propagation. Prefix trie for efficient literal prefix extraction from detector regex patterns.
- resolution
- Match resolution and deduplication. Match resolution: when multiple detectors match the same region, keep only the most specific, highest-confidence match. Eliminates duplicates.
- scanner_
config - Scanner configuration and state. Scanner configuration and scan state types.
- static_
intern - Static-string interner backed by vyre’s CHD perfect hash.
Used by
CompiledScannerto pre-intern detector metadata strings so the per-scanScanStateinterner is hit only by dynamic strings (file paths, commit SHAs). Static-string interner for the frozen detector-metadata universe. - telemetry
- Per-scan telemetry: always-on counters + opt-in
--dogfoodevents. Lightweight per-scan telemetry. - testing
- types
- Shared types for the scanner engine. Internal types and constants for the scanning engine.
- unicode_
hardening - Unicode normalization and homoglyph defense. Unicode hardening: detect and normalize Unicode evasion attacks.
Functions§
- compute_
line_ offsets - Compute line offsets for a block of text.
- find_
companion - Search for a companion pattern near a primary match.
- floor_
char_ boundary - Find the largest char boundary <= index.
- is_
within_ hex_ context - Check if a match is within a hex-encoded context.
- match_
entropy - measure shannon entropy of a byte slice.
- match_
line_ number - Map a byte offset to a line number using pre-computed offsets.
- normalize_
chunk_ data - Normalize scannable text by removing evasion characters and handling homoglyphs.
- normalize_
scannable_ chunk - Pre-process a chunk of text for scanning.
- should_
suppress_ known_ example_ credential - Check if a credential should be suppressed because it is a known example.