Expand description
Shared types for the scanner engine. Internal types and constants for the scanning engine.
Re-exports§
pub use crate::scanner_config::ScanState;pub use crate::scanner_config::ScannerConfig;pub use crate::scanner_config::MlPendingMatch;
Structs§
- Compiled
Companion - An optional compiled companion pattern for a detector.
- Compiled
Pattern - A compiled entry: one pattern from one detector. The regex is compiled
lazily on first use - see
LazyRegex. - Lazy
Regex - A detector pattern whose
Regexis compiled on first use, not at load.
Constants§
- FIRST_
CAPTURE_ GROUP_ INDEX - FIRST_
LINE_ NUMBER - FULL_
MATCH_ INDEX - Minimum AC literal prefix length. Shorter prefixes (e.g., “1”, “x”, “_”) match too many positions and degrade Aho-Corasick throughput.
- HEX_
CONTEXT_ RADIUS_ CHARS - How many characters around a hex match to inspect for structural context (assignment operators, quotes, keywords).
- LARGE_
FALLBACK_ SCAN_ THRESHOLD - MAX_
HEX_ CONTEXT_ SEPARATORS - Maximum non-hex separators (colons, dashes) tolerated within a hex context window before the match is treated as a non-hex string.
- MAX_
ML_ CACHE_ BYTES - MAX_
ML_ CACHE_ ENTRIES - MAX_
SCAN_ CHUNK_ BYTES - Maximum bytes scanned in a single chunk. Files larger than this are split
into overlapping windows. 1 MiB keeps peak RSS predictable under parallel
scanning with
rayon(N threads × 1 MiB per chunk = bounded memory). - MAX_
WINDOW_ DEDUP_ ENTRIES - Hard cap on the dedup set to prevent unbounded memory growth when scanning repositories with millions of duplicate credential-like strings.
- MIN_
FALLBACK_ LINE_ LENGTH - Minimum line length considered for fallback pattern scanning. Lines shorter than 8 bytes cannot contain a credential prefix plus a meaningful secret.
- MIN_
HEX_ CONTEXT_ DIGITS - Minimum hex digits required in the context window around a match to trigger hex-aware false-positive suppression.
- MIN_
HEX_ DIGITS_ IN_ MATCH - MIN_
HEX_ MATCH_ LEN - Minimum length for a standalone hex string to qualify as a potential secret.
Shorter hex runs (e.g., CSS colors like
#ff00ff) are too common. - MIN_
LITERAL_ PREFIX_ CHARS - ML_
CONTEXT_ RADIUS_ LINES - PREVIOUS_
LINE_ DISTANCE - REGEX_
SIZE_ LIMIT_ BYTES - Default per-regex AST + lazy-DFA-cache size limit. 1 MiB is large enough for complex detectors while preventing pathological patterns from consuming unbounded memory during regex compilation.
- WINDOW_
OVERLAP_ BYTES - Overlap between adjacent scan windows when a file exceeds
MAX_SCAN_CHUNK_BYTES. Must be larger than the longest secret the scanner can detect to avoid missing secrets that straddle a chunk boundary. 128 KiB covers PEM-encoded RSA-8192 keys, large JWTs, and multi-line concatenated secrets with generous margin.
Functions§
- regex_
dfa_ limit - The effective per-regex DFA size limit: the override if set, else the
compiled default
REGEX_SIZE_LIMIT_BYTES. - set_
regex_ dfa_ limit - Override the per-regex DFA size limit for this process. Call before scanning.
0resets to the compiled default. Tier-A config knob (default → TOML → CLI).