Struct ClassifiersConfig

Source

pub struct ClassifiersConfig {Show 19 fields
    pub enabled: bool,
    pub timeout_ms: u64,
    pub hf_token: Option<String>,
    pub scan_user_input: bool,
    pub injection_model: String,
    pub enforcement_mode: InjectionEnforcementMode,
    pub injection_threshold_soft: f32,
    pub injection_threshold: f32,
    pub injection_model_sha256: Option<String>,
    pub three_class_model: Option<String>,
    pub three_class_threshold: f32,
    pub three_class_model_sha256: Option<String>,
    pub pii_enabled: bool,
    pub pii_model: String,
    pub pii_threshold: f32,
    pub pii_model_sha256: Option<String>,
    pub pii_ner_max_chars: usize,
    pub pii_ner_allowlist: Vec<String>,
    pub pii_ner_circuit_breaker: u32,
}

Expand description

Configuration for the ML-backed classifier subsystem.

Placed under [classifiers] in config.toml. All fields are optional with safe defaults so existing configs continue to work when this section is absent.

When enabled = false (the default), all classifier code is bypassed and the existing regex-based detection runs unchanged.

Fields§

§enabled: bool

Master switch. When false, classifiers are never loaded or invoked.

§timeout_ms: u64

Per-inference timeout in milliseconds.

On timeout the call site falls back to regex. Separate from model download time.

§hf_token: Option<String>

Resolved HuggingFace Hub API token.

Must be the token value (not a vault key name) — resolved by the caller before constructing ClassifiersConfig. When None, model downloads are unauthenticated, which fails for gated or private repos.

§scan_user_input: bool

When true, the ML injection classifier runs on direct user chat messages.

Default false: the DeBERTa model is intended for external/untrusted content (tool output, web scrapes) — not for direct user input. Enabling this may cause false positives on benign conversational messages.

§injection_model: String

HuggingFace repo ID for the injection detection model.

§enforcement_mode: InjectionEnforcementMode

Enforcement mode for the injection classifier.

warn (default): scores above injection_threshold emit WARN and increment metrics but do NOT block content. Use this when deploying classifiers on tool outputs — FPR of 12-37% on benign content makes hard-blocking unsafe.

block: scores above injection_threshold block content. Only safe for well-calibrated models or when FPR is verified on your workload.

§injection_threshold_soft: f32

Soft threshold: classifier score at or above this emits a WARN log and increments the suspicious-injection metric, but content is allowed through.

Range: (0.0, 1.0]. Default 0.5. Must be ≤ injection_threshold.

§injection_threshold: f32

Hard threshold: classifier score at or above this blocks the content (in block mode) or emits WARN (in warn mode).

Range: (0.0, 1.0]. Conservative default of 0.95 minimises false positives. Real-world ML injection classifiers have 12–37% recall gaps at high thresholds — defense-in-depth via regex fallback and spotlighting is mandatory.

§injection_model_sha256: Option<String>

Optional SHA-256 hex digest of the injection model safetensors file.

When set, the file is verified before loading. Mismatch aborts startup with an error. Useful for security-sensitive deployments to detect corruption or tampering.

§three_class_model: Option<String>

Optional HuggingFace repo ID or local path for the three-class AlignSentinel model.

When set, content flagged as Suspicious or Blocked by the binary DeBERTa classifier is passed to this model for refinement. If the three-class model classifies the content as aligned-instruction or no-instruction, the verdict is downgraded to Clean. This directly reduces false positives from legitimate instruction-style content.

§three_class_threshold: f32

Confidence threshold for the three-class model’s misaligned-instruction label.

Content is only kept as Suspicious/Blocked when the misaligned score meets this threshold. Range: (0.0, 1.0]. Default 0.7.

§three_class_model_sha256: Option<String>

Optional SHA-256 hex digest of the three-class model safetensors file.

§pii_enabled: bool

Enable PII detection via the NER model (pii_model).

When true, CandlePiiClassifier runs on user messages in addition to the regex-based PiiFilter. Both results are merged (union with deduplication).

§pii_model: String

HuggingFace repo ID for the PII NER model.

§pii_threshold: f32

Minimum per-token confidence to accept a PII label.

Tokens below this threshold are treated as O (no entity). Default 0.75 balances recall on rarer entity types (DRIVERLICENSE, PASSPORT, IBAN) with precision. Raise to 0.85 to prefer precision over recall.

§pii_model_sha256: Option<String>

Optional SHA-256 hex digest of the PII model safetensors file.

§pii_ner_max_chars: usize

Maximum number of bytes passed to the NER PII classifier per call.

Input is truncated at a valid UTF-8 boundary before classification to prevent timeout on large tool outputs (e.g. search_code). Default 8192.

§pii_ner_allowlist: Vec<String>

Allowlist of tokens that are never redacted by the NER PII classifier, regardless of model confidence.

Matching is case-insensitive and exact (whole span text must equal an allowlist entry). This suppresses common false positives from the piiranha model — for example, “Zeph” is misclassified as a city (PII:CITY) by the base model.

Default entries: ["Zeph", "Rust", "OpenAI", "Ollama", "Claude"]. Set to [] to disable the allowlist entirely.

§pii_ner_circuit_breaker: u32

Number of consecutive NER timeouts before the circuit breaker trips and disables NER for the remainder of the session.

When the breaker trips, all subsequent chunks fall back to regex-only PII detection, preventing repeated timeout stalls on paginated reads (e.g. 12 chunks × 30 s = 6 min). Set to 0 to disable the circuit breaker (NER is always attempted).

Default: 2. Takes effect on the next session start if changed mid-session.

ClassifiersConfig

Struct ClassifiersConfig Copy item path

Fields§

Trait Implementations§

impl Clone for ClassifiersConfig

fn clone(&self) -> ClassifiersConfig

fn clone_from(&mut self, source: &Self)

impl Debug for ClassifiersConfig

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for ClassifiersConfig

fn default() -> Self

impl<'de> Deserialize<'de> for ClassifiersConfig

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where __D: Deserializer<'de>,

impl PartialEq for ClassifiersConfig

fn eq(&self, other: &ClassifiersConfig) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Serialize for ClassifiersConfig

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>where __S: Serializer,

impl StructuralPartialEq for ClassifiersConfig

Auto Trait Implementations§

impl Freeze for ClassifiersConfig

impl RefUnwindSafe for ClassifiersConfig

impl Send for ClassifiersConfig

impl Sync for ClassifiersConfig

impl Unpin for ClassifiersConfig

impl UnsafeUnpin for ClassifiersConfig

impl UnwindSafe for ClassifiersConfig

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> DynClone for Twhere T: Clone,

fn __clone_box(&self, _: Private) -> *mut ()

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> FromRef<T> for Twhere T: Clone,

fn from_ref(input: &T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> IntoRequest<T> for T

fn into_request(self) -> Request<T>

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

fn vzip(self) -> V

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>where S: Into<Dispatch>,

fn with_current_subscriber(self) -> WithDispatch<Self>

impl<T> DeserializeOwned for Twhere T: for<'de> Deserialize<'de>,

Struct ClassifiersConfig

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T> DynClone for T
where T: Clone,

impl<T> FromRef<T> for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,