Skip to main content

Crate keyhog_scanner

Crate keyhog_scanner 

Source
Expand description

Two-phase secret scanning engine.

Phase 1 builds an Aho-Corasick automaton from literal prefixes extracted from detector regex patterns and runs a single O(n) pass over the input. Phase 2 confirms candidate regions with the full regex. Patterns without extractable prefixes fall back to sequential regex scanning.

§Feature flags

  • ml — MoE ML classifier for confidence scoring (default: on)
  • entropy — Shannon entropy-based detection (default: on)
  • decode — Decode-through scanning: base64, hex, URL, HTML, MIME (default: on)
  • multiline — Multi-line concatenation joining (default: on)
  • gpu — GPU-accelerated batch ML inference (optional)

Additional layers: base64/hex decode-through, ML confidence scoring, structural context analysis, and multi-match resolution.

Modules§

confidence
Confidence scoring helpers for combining heuristic signals. Confidence scoring: combines multiple signals into a 0.0–1.0 score. Higher confidence means more likely to be a real secret.
context
Structural code-context inference used to adjust confidence. Structural context analysis: understand WHERE in code a potential secret appears.
decode
Decode-through scanning helpers for layered encodings. Decode-through scanning: decode base64 and hex strings before pattern matching.
entropy
Entropy-based fallback detection for unknown secret formats. Shannon entropy analysis for distinguishing secrets from ordinary text.
ml_scorer
Embedded ML scorer used to downrank likely placeholders and noise. ML-based secret scoring with a tiny mixture-of-experts network.
multiline
Multi-line preprocessing for string concatenation and line continuations. Multi-line string concatenation preprocessor.
prefix_trie
Prefix propagation tables for literal-prefix matching. Prefix trie for efficient literal prefix extraction from detector regex patterns.
resolution
Match-resolution helpers for suppressing lower-quality overlaps. Match resolution: when multiple detectors match the same region, keep only the most specific, highest-confidence match. Eliminates duplicates.
simd
Vectorscan/Hyperscan SIMD regex backend (optional, feature-gated). Vectorscan/Hyperscan SIMD regex backend for high-throughput scanning.

Structs§

CompiledScanner
The compiled scanner: all detector patterns fused into a single Aho-Corasick automaton for prefiltering, backed by individual regexes for extraction.

Enums§

ScanError
Errors returned while compiling detector patterns into a scanner.