pub struct ClassifiersConfig {Show 19 fields
pub enabled: bool,
pub timeout_ms: u64,
pub hf_token: Option<String>,
pub scan_user_input: bool,
pub injection_model: String,
pub enforcement_mode: InjectionEnforcementMode,
pub injection_threshold_soft: f32,
pub injection_threshold: f32,
pub injection_model_sha256: Option<String>,
pub three_class_model: Option<String>,
pub three_class_threshold: f32,
pub three_class_model_sha256: Option<String>,
pub pii_enabled: bool,
pub pii_model: String,
pub pii_threshold: f32,
pub pii_model_sha256: Option<String>,
pub pii_ner_max_chars: usize,
pub pii_ner_allowlist: Vec<String>,
pub pii_ner_circuit_breaker: u32,
}Expand description
Configuration for the ML-backed classifier subsystem.
Placed under [classifiers] in config.toml. All fields are optional with safe defaults
so existing configs continue to work when this section is absent.
When enabled = false (the default), all classifier code is bypassed and the existing
regex-based detection runs unchanged.
Fields§
§enabled: boolMaster switch. When false, classifiers are never loaded or invoked.
timeout_ms: u64Per-inference timeout in milliseconds.
On timeout the call site falls back to regex. Separate from model download time.
hf_token: Option<String>Resolved HuggingFace Hub API token.
Must be the token value (not a vault key name) — resolved by the caller before
constructing ClassifiersConfig. When None, model downloads are unauthenticated,
which fails for gated or private repos.
scan_user_input: boolWhen true, the ML injection classifier runs on direct user chat messages.
Default false: the DeBERTa model is intended for external/untrusted content
(tool output, web scrapes) — not for direct user input. Enabling this may cause
false positives on benign conversational messages.
injection_model: StringHuggingFace repo ID for the injection detection model.
enforcement_mode: InjectionEnforcementModeEnforcement mode for the injection classifier.
warn (default): scores above injection_threshold emit WARN and increment metrics
but do NOT block content. Use this when deploying classifiers on tool outputs —
FPR of 12-37% on benign content makes hard-blocking unsafe.
block: scores above injection_threshold block content. Only safe for well-calibrated
models or when FPR is verified on your workload.
injection_threshold_soft: f32Soft threshold: classifier score at or above this emits a WARN log and increments the suspicious-injection metric, but content is allowed through.
Range: (0.0, 1.0]. Default 0.5. Must be ≤ injection_threshold.
injection_threshold: f32Hard threshold: classifier score at or above this blocks the content (in block mode)
or emits WARN (in warn mode).
Range: (0.0, 1.0]. Conservative default of 0.95 minimises false positives.
Real-world ML injection classifiers have 12–37% recall gaps at high thresholds —
defense-in-depth via regex fallback and spotlighting is mandatory.
injection_model_sha256: Option<String>Optional SHA-256 hex digest of the injection model safetensors file.
When set, the file is verified before loading. Mismatch aborts startup with an error. Useful for security-sensitive deployments to detect corruption or tampering.
three_class_model: Option<String>Optional HuggingFace repo ID or local path for the three-class AlignSentinel model.
When set, content flagged as Suspicious or Blocked by the binary DeBERTa classifier
is passed to this model for refinement. If the three-class model classifies the content
as aligned-instruction or no-instruction, the verdict is downgraded to Clean.
This directly reduces false positives from legitimate instruction-style content.
three_class_threshold: f32Confidence threshold for the three-class model’s misaligned-instruction label.
Content is only kept as Suspicious/Blocked when the misaligned score meets this threshold.
Range: (0.0, 1.0]. Default 0.7.
three_class_model_sha256: Option<String>Optional SHA-256 hex digest of the three-class model safetensors file.
pii_enabled: boolEnable PII detection via the NER model (pii_model).
When true, CandlePiiClassifier runs on user messages in addition to the
regex-based PiiFilter. Both results are merged (union with deduplication).
pii_model: StringHuggingFace repo ID for the PII NER model.
pii_threshold: f32Minimum per-token confidence to accept a PII label.
Tokens below this threshold are treated as O (no entity).
Default 0.75 balances recall on rarer entity types (DRIVERLICENSE, PASSPORT, IBAN)
with precision. Raise to 0.85 to prefer precision over recall.
pii_model_sha256: Option<String>Optional SHA-256 hex digest of the PII model safetensors file.
pii_ner_max_chars: usizeMaximum number of bytes passed to the NER PII classifier per call.
Input is truncated at a valid UTF-8 boundary before classification to prevent
timeout on large tool outputs (e.g. search_code). Default 8192.
pii_ner_allowlist: Vec<String>Allowlist of tokens that are never redacted by the NER PII classifier, regardless of model confidence.
Matching is case-insensitive and exact (whole span text must equal an allowlist entry). This suppresses common false positives from the piiranha model — for example, “Zeph” is misclassified as a city (PII:CITY) by the base model.
Default entries: ["Zeph", "Rust", "OpenAI", "Ollama", "Claude"].
Set to [] to disable the allowlist entirely.
pii_ner_circuit_breaker: u32Number of consecutive NER timeouts before the circuit breaker trips and disables NER for the remainder of the session.
When the breaker trips, all subsequent chunks fall back to regex-only PII detection,
preventing repeated timeout stalls on paginated reads (e.g. 12 chunks × 30 s = 6 min).
Set to 0 to disable the circuit breaker (NER is always attempted).
Default: 2. Takes effect on the next session start if changed mid-session.
Trait Implementations§
Source§impl Clone for ClassifiersConfig
impl Clone for ClassifiersConfig
Source§fn clone(&self) -> ClassifiersConfig
fn clone(&self) -> ClassifiersConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for ClassifiersConfig
impl Debug for ClassifiersConfig
Source§impl Default for ClassifiersConfig
impl Default for ClassifiersConfig
Source§impl<'de> Deserialize<'de> for ClassifiersConfig
impl<'de> Deserialize<'de> for ClassifiersConfig
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl PartialEq for ClassifiersConfig
impl PartialEq for ClassifiersConfig
Source§impl Serialize for ClassifiersConfig
impl Serialize for ClassifiersConfig
impl StructuralPartialEq for ClassifiersConfig
Auto Trait Implementations§
impl Freeze for ClassifiersConfig
impl RefUnwindSafe for ClassifiersConfig
impl Send for ClassifiersConfig
impl Sync for ClassifiersConfig
impl Unpin for ClassifiersConfig
impl UnsafeUnpin for ClassifiersConfig
impl UnwindSafe for ClassifiersConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T in a tonic::Request