Skip to main content

Module testing

Module testing 

Source

Re-exports§

pub use crate::compiler::rewrite_alternation_prefix;
pub use crate::compiler::split_leading_inline_flag;
pub use crate::confidence::penalties::finalize_confidence;
pub use crate::engine::boundary::scan_chunk_boundaries;
pub use crate::engine::gpu_regex_dfa::extract_literal_core;
pub use crate::entropy::keywords::looks_like_program_identifier;
pub use crate::ml_scorer::compute_features_with_config;
pub use crate::static_intern::seed_source_type_count;
pub use crate::decode::caesar::caesar_shift;
pub use crate::decode::caesar::is_source_code_path;
pub use crate::decode::caesar::looks_credential_shaped;
pub use crate::decode::caesar::CaesarDecoder;
pub use crate::decode::hex::find_hex_strings;
pub use crate::decode::reverse::looks_reversible;
pub use crate::decode::reverse::reverse_str;
pub use crate::decode::reverse::ReverseDecoder;
pub use crate::decode::util::take_hex_digits;
pub use crate::gpu::env_no_gpu;
pub use crate::gpu::is_ci_environment;

Modules§

ascii_ci
compiler_prefix
entropy_keywords
Internal prose/decoy/strict-secret predicates, exposed for the unit tests migrated out of src/entropy/keywords.rs (KH-GAP-004).
entropy_scanner
Internal entropy shape-classification predicates, exposed for the canonical-shape unit tests migrated out of src/entropy/scanner.rs (KH-GAP-004). credential_keyword_context builds the production credential anchor so tests need not know the private tuning constants.
shape

Structs§

HsScanner
Compiled Hyperscan databases for all detector patterns, sharded across cores at compile time.
ProbabilisticGate
A tiny statistical gate for fast candidate rejection.

Constants§

HOT_PATTERNS
Common high-value secret prefixes that trigger Layer 1 SIMD.
HOT_PATTERN_DETECTOR_IDS
Canonical detector_id per hot pattern - the id of the named detector the fast-path represents, so scan output (JSON/SARIF/text/baselines) is identical regardless of which engine path made the find. sq0csp- keeps hot-square_secret: no standalone square-secret detector exists yet, so it is genuinely fast-path-only (keyhog explain documents this). Static (not format!-per-match) to keep the per-hit allocation the perf audit removed.
HOT_PATTERN_DISPLAY_NAMES
Canonical human-readable detector name per hot pattern (matches the name field of the corresponding detectors/*.toml). Square has no canonical detector, so it carries a plain “Square Secret” label.
HOT_PATTERN_NAMES
service field per hot pattern - the CANONICAL service of the detector this fast-path stands in for, NOT an internal *_key label. The hot path is a perf optimization, not a distinct detector: a leaked AKIA… is an aws-access-key finding however the engine found it. Before 2026-05-29 these were aws_key/github_pat/… so the SAME secret surfaced as hot-aws_key/service aws_key on Linux (Hyperscan path) but aws-access-key/service aws on macOS/Windows (portable, no hot path) - a cross-platform id divergence. Emitting canonical identity here makes all platforms agree and matches what keyhog explain already resolves hot ids to. Index-parallel with HOT_PATTERNS / the two arrays below.

Functions§

attribute_matches_to_chunks
Attribute each global GPU match to its source chunk using the coalesce-entry table (chunk_index, offset, len). Matches that straddle a chunk boundary are dropped (the coalesce separator makes a true cross-chunk hit impossible; this skip is the safety net for any pid > total_patterns smuggled through).
calculate_shannon_entropy
Shannon entropy of chunk in bits/byte.
fold_overlapping_same_pid_inplace
Sort by (pid, start, end), fold same-pid overlapping spans, then re-sort by start. The downstream chunk-attribution walk expects matches in start-ascending order; the per-pid fold collapses the duplicate (pid, start, end) triples that subgroup-ballot can emit when a hit straddles a workgroup boundary.
gpu_phase2_hits_are_dense
Large many-file batches with dense literal-prefix output are pathological for the two-phase literal GPU path: phase 1 is fast, but phase 2 has to confirm too many broad detector prefixes on CPU. Rerouting that batch through the existing SIMD coalesced scanner preserves the finding contract and avoids turning permissive prefixes into thousands of whole-chunk regex confirmations.
hash_fast
FNV-1a hash of data. Non-cryptographic; used as a content key for dedup and memoization across the scanner. Keep the seed/prime in sync here only - every cache that keys on this depends on the value being identical.
looks_like_standard_base64_blob
True if credential is a standard-base64-encoded arbitrary-bytes blob (protobuf wire format, marshalled binary, etc.) rather than a credential token.
memoize_by_hash
Look up key in a thread-local HashMap<u64, T>, computing and inserting the value via compute on a miss.
parse_docker_compose
Parse docker-compose.yml environment blocks.
parse_env
Parse KEY=VALUE lines from an .env file.
parse_hcl
Parse Terraform / HCL variable "<name>" { default = "<value>" } blocks, flat .tfvars assignments, and simple locals { x = "v" } assignment shapes into (context, value) pairs.
parse_jupyter
Parse Jupyter notebook JSON and extract code cell sources.
parse_k8s_secret
Parse a Kubernetes Secret YAML and decode base64 values under data:.
parse_tfstate
Parse Terraform state JSON and recursively extract value fields.