pub struct ClassifiersConfig {Show 19 fields
pub enabled: bool,
pub timeout_ms: u64,
pub hf_token: Option<String>,
pub scan_user_input: bool,
pub injection_model: String,
pub enforcement_mode: InjectionEnforcementMode,
pub injection_threshold_soft: f32,
pub injection_threshold: f32,
pub injection_model_sha256: Option<String>,
pub three_class_model: Option<String>,
pub three_class_threshold: f32,
pub three_class_model_sha256: Option<String>,
pub pii_enabled: bool,
pub pii_model: String,
pub pii_threshold: f32,
pub pii_model_sha256: Option<String>,
pub pii_ner_max_chars: usize,
pub pii_ner_allowlist: Vec<String>,
pub pii_ner_circuit_breaker: u32,
}Expand description
Configuration for the ML-backed classifier subsystem.
Placed under [classifiers] in config.toml. All fields are optional with safe defaults
so existing configs continue to work when this section is absent.
When enabled = false (the default), all classifier code is bypassed and the existing
regex-based detection runs unchanged.
Fields§
§enabled: boolMaster switch. When false, classifiers are never loaded or invoked.
timeout_ms: u64Per-inference timeout in milliseconds.
On timeout the call site falls back to regex. Separate from model download time.
hf_token: Option<String>Resolved HuggingFace Hub API token.
Must be the token value (not a vault key name) — resolved by the caller before
constructing ClassifiersConfig. When None, model downloads are unauthenticated,
which fails for gated or private repos.
scan_user_input: boolWhen true, the ML injection classifier runs on direct user chat messages.
Default false: the DeBERTa model is intended for external/untrusted content
(tool output, web scrapes) — not for direct user input. Enabling this may cause
false positives on benign conversational messages.
injection_model: StringHuggingFace repo ID for the injection detection model.
enforcement_mode: InjectionEnforcementModeEnforcement mode for the injection classifier.
warn (default): scores above injection_threshold emit WARN and increment metrics
but do NOT block content. Use this when deploying classifiers on tool outputs —
FPR of 12-37% on benign content makes hard-blocking unsafe.
block: scores above injection_threshold block content. Only safe for well-calibrated
models or when FPR is verified on your workload.
injection_threshold_soft: f32Soft threshold: classifier score at or above this emits a WARN log and increments the suspicious-injection metric, but content is allowed through.
Range: (0.0, 1.0]. Default 0.5. Must be ≤ injection_threshold.
injection_threshold: f32Hard threshold: classifier score at or above this blocks the content (in block mode)
or emits WARN (in warn mode).
Range: (0.0, 1.0]. Conservative default of 0.95 minimises false positives.
Real-world ML injection classifiers have 12–37% recall gaps at high thresholds —
defense-in-depth via regex fallback and spotlighting is mandatory.
injection_model_sha256: Option<String>Optional SHA-256 hex digest of the injection model safetensors file.
When set, the file is verified before loading. Mismatch aborts startup with an error. Useful for security-sensitive deployments to detect corruption or tampering.
three_class_model: Option<String>Optional HuggingFace repo ID or local path for the three-class AlignSentinel model.
When set, content flagged as Suspicious or Blocked by the binary DeBERTa classifier
is passed to this model for refinement. If the three-class model classifies the content
as aligned-instruction or no-instruction, the verdict is downgraded to Clean.
This directly reduces false positives from legitimate instruction-style content.
three_class_threshold: f32Confidence threshold for the three-class model’s misaligned-instruction label.
Content is only kept as Suspicious/Blocked when the misaligned score meets this threshold.
Range: (0.0, 1.0]. Default 0.7.
three_class_model_sha256: Option<String>Optional SHA-256 hex digest of the three-class model safetensors file.
pii_enabled: boolEnable PII detection via the NER model (pii_model).
When true, CandlePiiClassifier runs on user messages in addition to the
regex-based PiiFilter. Both results are merged (union with deduplication).
pii_model: StringHuggingFace repo ID for the PII NER model.
pii_threshold: f32Minimum per-token confidence to accept a PII label.
Tokens below this threshold are treated as O (no entity).
Default 0.75 balances recall on rarer entity types (DRIVERLICENSE, PASSPORT, IBAN)
with precision. Raise to 0.85 to prefer precision over recall.
pii_model_sha256: Option<String>Optional SHA-256 hex digest of the PII model safetensors file.
pii_ner_max_chars: usizeMaximum number of bytes passed to the NER PII classifier per call.
Input is truncated at a valid UTF-8 boundary before classification to prevent
timeout on large tool outputs (e.g. search_code). Default 8192.
pii_ner_allowlist: Vec<String>Allowlist of tokens that are never redacted by the NER PII classifier, regardless of model confidence.
Matching is case-insensitive and exact (whole span text must equal an allowlist entry). This suppresses common false positives from the piiranha model — for example, “Zeph” is misclassified as a city (PII:CITY) by the base model.
Default entries: ["Zeph", "Rust", "OpenAI", "Ollama", "Claude"].
Set to [] to disable the allowlist entirely.
pii_ner_circuit_breaker: u32Number of consecutive NER timeouts before the circuit breaker trips and disables NER for the remainder of the session.
When the breaker trips, all subsequent chunks fall back to regex-only PII detection,
preventing repeated timeout stalls on paginated reads (e.g. 12 chunks × 30 s = 6 min).
Set to 0 to disable the circuit breaker (NER is always attempted).
Default: 2. Takes effect on the next session start if changed mid-session.
Trait Implementations§
Source§impl Clone for ClassifiersConfig
impl Clone for ClassifiersConfig
Source§fn clone(&self) -> ClassifiersConfig
fn clone(&self) -> ClassifiersConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more