Crate keyhog_scanner

Expand description

KeyHog Scanner: A high-performance, multi-layered secret detection engine.

This crate implements the core scanning logic, combining SIMD pre-filtering, Aho-Corasick literal matching, regex fallback, and ML-based confidence scoring.

Re-exports§

pub use multiline::fragment_cache;
pub use engine::GpuPhase1Output;
pub use engine::CompiledScanner;
pub use engine::GpuInitPolicy;
pub use error::Result;
pub use error::ScanError;
pub use hw_probe::probe_hardware;
pub use hw_probe::select_backend;
pub use hw_probe::HardwareCaps;
pub use hw_probe::ScanBackend;
pub use types::ScannerConfig;

Modules§

alphabet_filter: SIMD-accelerated alphabet pre-filtering. Alphabet-based bitmask pre-filtering for ultra-fast chunk skipping.
aws: Offline AWS account-ID recovery from an access-key ID (no network/verify). Offline AWS account-ID recovery + canary-token classification.
bigram_bloom: Bigram bloom filter for fast chunk gating. Bigram-bloom prefilter - Layer 0.5 between alphabet screening and AC/HS.
checksum: Service-specific credential checksum validation (GitHub, npm, Slack, etc.). Checksum-aware credential validation.
compiler: Detector compilation into high-performance matching structures. Logic for compiling detector specifications into an efficient scanning engine.
confidence: Heuristic and ML-based confidence scoring for candidate matches. Confidence scoring: combines multiple signals into a 0.0–1.0 score. Higher confidence means more likely to be a real secret.
context: Code context analysis (comments, assignments, test files). Structural context analysis: understand WHERE in code a potential secret appears.
decode: Decode-through pipeline for nested encodings (base64, hex, URL, etc.). Decode-through scanning: decode base64 and hex strings before pattern matching.
decode_structure: Decode-structure analysis: classify what a candidate base64/hex-decodes to (binary asset magic bytes, protobuf wire) so decode-through feeds scoring. Decode-structure analysis: keyhog’s decode-through advantage, fed into scoring.
engine: Core scan execution engine. Core scanning engine implementation.
entropy: Shannon entropy analysis for secret detection. Shannon entropy analysis for distinguishing secrets from ordinary text.
entropy_fast: Fast scalar entropy calculation. Fast vectorized entropy calculation with architecture-specific implementations.
error: Specialized error types for the scanner. Specialized error types for the scanner engine.
gpu: GPU-accelerated matching via wgpu. GPU-accelerated batch inference for the MoE classifier via wgpu compute shaders.
hw_probe: Hardware capability detection and backend selection. Hardware capability probing with once-cached results.
jwt: JWT structural validation and anomaly detection. JWT structural validation.
ml_scorer: Machine learning inference for secret scoring. ML-based secret scoring with a tiny mixture-of-experts network.
multiline: Multiline secret reassembly logic. Multi-line string concatenation preprocessor.
pipeline: Internal scan pipeline orchestration. Scan pipeline: context windows, scan-loop helpers, and post-match processing.
prefix_trie: Prefix trie for efficient keyword propagation. Prefix trie for efficient literal prefix extraction from detector regex patterns.
resolution: Match resolution and deduplication. Match resolution: when multiple detectors match the same region, keep only the most specific, highest-confidence match. Eliminates duplicates.
scanner_config: Scanner configuration and state. Scanner configuration and scan state types.
static_intern: Static-string interner backed by vyre’s CHD perfect hash. Used by CompiledScanner to pre-intern detector metadata strings so the per-scan ScanState interner is hit only by dynamic strings (file paths, commit SHAs). Static-string interner for the frozen detector-metadata universe.
telemetry: Per-scan telemetry: always-on counters + opt-in --dogfood events. Lightweight per-scan telemetry.
testing
types: Shared types for the scanner engine. Internal types and constants for the scanning engine.
unicode_hardening: Unicode normalization and homoglyph defense. Unicode hardening: detect and normalize Unicode evasion attacks.

Functions§

compute_line_offsets: Compute line offsets for a block of text.
find_companion: Search for a companion pattern near a primary match.
floor_char_boundary: Find the largest char boundary <= index.
is_within_hex_context: Check if a match is within a hex-encoded context.
match_entropy: measure shannon entropy of a byte slice.
match_line_number: Map a byte offset to a line number using pre-computed offsets.
normalize_chunk_data: Normalize scannable text by removing evasion characters and handling homoglyphs.
normalize_scannable_chunk: Pre-process a chunk of text for scanning.
should_suppress_known_example_credential: Check if a credential should be suppressed because it is a known example.

Crate keyhog_scanner

Crate keyhog_scanner Copy item path

Re-exports§

Modules§

Functions§

Crate keyhog_scanner