pub struct ScannerConfig {Show 20 fields
pub max_decode_depth: usize,
pub validate_decode: bool,
pub entropy_enabled: bool,
pub entropy_threshold: f64,
pub entropy_in_source_files: bool,
pub entropy_ml_authoritative: bool,
pub generic_keyword_low_entropy: bool,
pub ml_enabled: bool,
pub ml_weight: f64,
pub min_confidence: f64,
pub unicode_normalization: bool,
pub max_decode_bytes: usize,
pub max_matches_per_chunk: usize,
pub scan_comments: bool,
pub multiline: MultilineConfig,
pub known_prefixes: Vec<String>,
pub secret_keywords: Vec<String>,
pub test_keywords: Vec<String>,
pub placeholder_keywords: Vec<String>,
pub penalize_test_paths: bool,
}Expand description
Configuration for the scanner’s decoding and processing heuristics.
Fields§
§max_decode_depth: usizeMaximum recursion depth for decode-through (base64, hex, etc.)
validate_decode: boolValidate decoded strings (e.g. check if decoded base64 is UTF-8)
entropy_enabled: boolEnable entropy-based detection
entropy_threshold: f64Threshold for entropy-based detection
entropy_in_source_files: boolEnable entropy-based detection in source code files
Route entropy-fallback candidates through the MoE with the model
AUTHORITATIVE (no entropy-magnitude floor) instead of the bare entropy
heuristic. Mirrors keyhog_core::config::ScanConfig::entropy_ml_authoritative
and the CLI --no-entropy-ml-scoring opt-out. No-op unless both
entropy_enabled and ml_enabled are set. See apply_ml_batch_scores
and scan_entropy_fallback.
generic_keyword_low_entropy: boolAdmit generic keyword-bridge values (PASSWORD=, *_PASS=, secret:,
api_key= …) on the relaxed generic-keyword-secret entropy floor
instead of the high generic-secret floor. Mirrors
keyhog_core::config::ScanConfig::generic_keyword_low_entropy and the CLI
--no-keyword-low-entropy opt-out. The keyword key is the evidence;
precision is carried by the MoE + shape filters. See
scan_generic_assignments.
ml_enabled: boolEnable ML-based confidence scoring
ml_weight: f64ML weight for confidence scoring, 0.0-1.0
min_confidence: f64Minimum confidence threshold for matches
unicode_normalization: boolEnable Unicode normalization
max_decode_bytes: usizeMaximum bytes for decode-through processing
max_matches_per_chunk: usizeMaximum matches to collect per chunk before stopping. Prevents OOM on extremely noisy files.
scan_comments: boolWhen true, credentials inside source-code comments are
treated as first-class findings (no confidence downgrade,
no comment-context multiplier). Mirrors
keyhog_core::config::ScanConfig::scan_comments and the
CLI’s --scan-comments flag. See that field’s doc for why
the default is off.
multiline: MultilineConfigConfiguration for multiline concatenation
known_prefixes: Vec<String>Known secret prefixes used to boost confidence.
secret_keywords: Vec<String>Keywords indicating a secret context (e.g. “api_key”, “token”).
test_keywords: Vec<String>Keywords indicating a test/mock context (e.g. “test”, “fake”).
placeholder_keywords: Vec<String>Keywords indicating a placeholder value (e.g. “change_me”, “todo”).
penalize_test_paths: boolApply test/example path confidence and hard-suppression heuristics.
The CLI disables this for --no-suppress-test-fixtures.
Implementations§
Source§impl ScannerConfig
impl ScannerConfig
Sourcepub const HIGH_PRECISION_MIN_CONFIDENCE: f64 = 0.85
pub const HIGH_PRECISION_MIN_CONFIDENCE: f64 = 0.85
Confidence floor for ScannerConfig::high_precision. Distinct from the
canonical ScanConfig::default() floor (0.40) on purpose: precision mode
trades recall for a near-zero false-positive rate at mass-scan scale.
pub fn fast() -> Self
pub fn thorough() -> Self
Sourcepub fn high_precision() -> Self
pub fn high_precision() -> Self
High-precision mass-scan preset: minimise false positives at the cost of some recall, for scanning huge corpora where every FP is expensive to triage. Fully offline and fast (no ML, no entropy sweep, shallow decode).
entropy_enabled = false: generic high-entropy matching is the single largest FP source; precision mode drops it entirely.ml_enabled = true(inherited): ML is the confidence discriminator that lifts genuine secrets over the high floor while leaving FP-shaped tokens below it. Disabling it would crater the scores the 0.85 bar relies on, so precision KEEPS ML (this mode trades recall for precision, not for speed — use--fastwhen speed is the goal).min_confidence = HIGH_PRECISION_MIN_CONFIDENCE(0.85): combined with the engine’s checksum policy (valid token → floored 0.9, invalid → capped 0.1) and clamped over every detector’s self-declared floor, this bar admits checksum-validated tokens and strong ML-scored findings while dropping checksum-failures and weak-signal matches.max_decode_depth = 1: deep-decoded payloads are a FP source at scale.
penalize_test_paths stays on (the default) to suppress fixture-shaped
hits. A --min-confidence override still layers on top of this preset.
pub fn min_confidence(self, min_confidence: f64) -> Self
Sourcepub fn sanitise(&mut self)
pub fn sanitise(&mut self)
Clamp every float field into its valid range and replace any
NaN with a safe default. A user-supplied
--min-confidence=-5.0 or a corrupt config TOML feeding
min_confidence = nan would otherwise NaN-infect the
confidence-comparison path and silently drop every finding
(NaN comparisons are always false, so conf < min_confidence
is false, but conf >= min_confidence is also false,
behaviour-dependent on the call site).
Idempotent - sanitising an already-sane config is a no-op.
Called inside From<ScanConfig> so any path that constructs
a ScannerConfig from a user-influenced source pays this
once at config-build time.
Trait Implementations§
Source§impl Clone for ScannerConfig
impl Clone for ScannerConfig
Source§fn clone(&self) -> ScannerConfig
fn clone(&self) -> ScannerConfig
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for ScannerConfig
impl Debug for ScannerConfig
Source§impl Default for ScannerConfig
impl Default for ScannerConfig
Source§impl From<ScanConfig> for ScannerConfig
impl From<ScanConfig> for ScannerConfig
Source§fn from(config: ScanConfig) -> Self
fn from(config: ScanConfig) -> Self
Auto Trait Implementations§
impl Freeze for ScannerConfig
impl RefUnwindSafe for ScannerConfig
impl Send for ScannerConfig
impl Sync for ScannerConfig
impl Unpin for ScannerConfig
impl UnsafeUnpin for ScannerConfig
impl UnwindSafe for ScannerConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more