Expand description
Two-phase secret scanning engine.
Phase 1 builds an Aho-Corasick automaton from literal prefixes extracted from detector regex patterns and runs a single O(n) pass over the input. Phase 2 confirms candidate regions with the full regex. Patterns without extractable prefixes fall back to sequential regex scanning.
§Feature flags
ml— MoE ML classifier for confidence scoring (default: on)entropy— Shannon entropy-based detection (default: on)decode— Decode-through scanning: base64, hex, URL, HTML, MIME (default: on)multiline— Multi-line concatenation joining (default: on)gpu— GPU-accelerated batch ML inference (optional)
Additional layers: base64/hex decode-through, ML confidence scoring, structural context analysis, and multi-match resolution.
Modules§
- confidence
- Confidence scoring helpers for combining heuristic signals. Confidence scoring: combines multiple signals into a 0.0–1.0 score. Higher confidence means more likely to be a real secret.
- context
- Structural code-context inference used to adjust confidence. Structural context analysis: understand WHERE in code a potential secret appears.
- decode
- Decode-through scanning helpers for layered encodings. Decode-through scanning: decode base64 and hex strings before pattern matching.
- entropy
- Entropy-based fallback detection for unknown secret formats. Shannon entropy analysis for distinguishing secrets from ordinary text.
- ml_
scorer - Embedded ML scorer used to downrank likely placeholders and noise. ML-based secret scoring with a tiny mixture-of-experts network.
- multiline
- Multi-line preprocessing for string concatenation and line continuations. Multi-line string concatenation preprocessor.
- prefix_
trie - Prefix propagation tables for literal-prefix matching. Prefix trie for efficient literal prefix extraction from detector regex patterns.
- resolution
- Match-resolution helpers for suppressing lower-quality overlaps. Match resolution: when multiple detectors match the same region, keep only the most specific, highest-confidence match. Eliminates duplicates.
- simd
- Vectorscan/Hyperscan SIMD regex backend (optional, feature-gated). Vectorscan/Hyperscan SIMD regex backend for high-throughput scanning.
Structs§
- Compiled
Scanner - The compiled scanner: all detector patterns fused into a single Aho-Corasick automaton for prefiltering, backed by individual regexes for extraction.
Enums§
- Scan
Error - Errors returned while compiling detector patterns into a scanner.