Skip to main content

Crate keyhog_scanner

Crate keyhog_scanner 

Source
Expand description

KeyHog Scanner: A high-performance, multi-layered secret detection engine.

This crate implements the core scanning logic, combining SIMD pre-filtering, Aho-Corasick literal matching, regex fallback, and ML-based confidence scoring.

Re-exports§

pub use multiline::fragment_cache;
pub use engine::GpuPhase1Output;
pub use engine::CompiledScanner;
pub use engine::GpuInitPolicy;
pub use error::Result;
pub use error::ScanError;
pub use hw_probe::probe_hardware;
pub use hw_probe::select_backend;
pub use hw_probe::HardwareCaps;
pub use hw_probe::ScanBackend;
pub use types::ScannerConfig;

Modules§

alphabet_filter
SIMD-accelerated alphabet pre-filtering. Alphabet-based bitmask pre-filtering for ultra-fast chunk skipping.
aws
Offline AWS account-ID recovery from an access-key ID (no network/verify). Offline AWS account-ID recovery + canary-token classification.
bigram_bloom
Bigram bloom filter for fast chunk gating. Bigram-bloom prefilter - Layer 0.5 between alphabet screening and AC/HS.
checksum
Service-specific credential checksum validation (GitHub, npm, Slack, etc.). Checksum-aware credential validation.
compiler
Detector compilation into high-performance matching structures. Logic for compiling detector specifications into an efficient scanning engine.
confidence
Heuristic and ML-based confidence scoring for candidate matches. Confidence scoring: combines multiple signals into a 0.0–1.0 score. Higher confidence means more likely to be a real secret.
context
Code context analysis (comments, assignments, test files). Structural context analysis: understand WHERE in code a potential secret appears.
decode
Decode-through pipeline for nested encodings (base64, hex, URL, etc.). Decode-through scanning: decode base64 and hex strings before pattern matching.
decode_structure
Decode-structure analysis: classify what a candidate base64/hex-decodes to (binary asset magic bytes, protobuf wire) so decode-through feeds scoring. Decode-structure analysis: keyhog’s decode-through advantage, fed into scoring.
engine
Core scan execution engine. Core scanning engine implementation.
entropy
Shannon entropy analysis for secret detection. Shannon entropy analysis for distinguishing secrets from ordinary text.
entropy_fast
Fast scalar entropy calculation. Fast vectorized entropy calculation with architecture-specific implementations.
error
Specialized error types for the scanner. Specialized error types for the scanner engine.
gpu
GPU-accelerated matching via wgpu. GPU-accelerated batch inference for the MoE classifier via wgpu compute shaders.
hw_probe
Hardware capability detection and backend selection. Hardware capability probing with once-cached results.
jwt
JWT structural validation and anomaly detection. JWT structural validation.
ml_scorer
Machine learning inference for secret scoring. ML-based secret scoring with a tiny mixture-of-experts network.
multiline
Multiline secret reassembly logic. Multi-line string concatenation preprocessor.
pipeline
Internal scan pipeline orchestration. Scan pipeline: context windows, scan-loop helpers, and post-match processing.
prefix_trie
Prefix trie for efficient keyword propagation. Prefix trie for efficient literal prefix extraction from detector regex patterns.
resolution
Match resolution and deduplication. Match resolution: when multiple detectors match the same region, keep only the most specific, highest-confidence match. Eliminates duplicates.
scanner_config
Scanner configuration and state. Scanner configuration and scan state types.
static_intern
Static-string interner backed by vyre’s CHD perfect hash. Used by CompiledScanner to pre-intern detector metadata strings so the per-scan ScanState interner is hit only by dynamic strings (file paths, commit SHAs). Static-string interner for the frozen detector-metadata universe.
telemetry
Per-scan telemetry: always-on counters + opt-in --dogfood events. Lightweight per-scan telemetry.
testing
types
Shared types for the scanner engine. Internal types and constants for the scanning engine.
unicode_hardening
Unicode normalization and homoglyph defense. Unicode hardening: detect and normalize Unicode evasion attacks.

Functions§

compute_line_offsets
Compute line offsets for a block of text.
find_companion
Search for a companion pattern near a primary match.
floor_char_boundary
Find the largest char boundary <= index.
is_within_hex_context
Check if a match is within a hex-encoded context.
match_entropy
measure shannon entropy of a byte slice.
match_line_number
Map a byte offset to a line number using pre-computed offsets.
normalize_chunk_data
Normalize scannable text by removing evasion characters and handling homoglyphs.
normalize_scannable_chunk
Pre-process a chunk of text for scanning.
should_suppress_known_example_credential
Check if a credential should be suppressed because it is a known example.