Skip to main content

ClassifiersConfig

Struct ClassifiersConfig 

Source
pub struct ClassifiersConfig {
Show 19 fields pub enabled: bool, pub timeout_ms: u64, pub hf_token: Option<String>, pub scan_user_input: bool, pub injection_model: String, pub enforcement_mode: InjectionEnforcementMode, pub injection_threshold_soft: f32, pub injection_threshold: f32, pub injection_model_sha256: Option<String>, pub three_class_model: Option<String>, pub three_class_threshold: f32, pub three_class_model_sha256: Option<String>, pub pii_enabled: bool, pub pii_model: String, pub pii_threshold: f32, pub pii_model_sha256: Option<String>, pub pii_ner_max_chars: usize, pub pii_ner_allowlist: Vec<String>, pub pii_ner_circuit_breaker: u32,
}
Expand description

Configuration for the ML-backed classifier subsystem.

Placed under [classifiers] in config.toml. All fields are optional with safe defaults so existing configs continue to work when this section is absent.

When enabled = false (the default), all classifier code is bypassed and the existing regex-based detection runs unchanged.

Fields§

§enabled: bool

Master switch. When false, classifiers are never loaded or invoked.

§timeout_ms: u64

Per-inference timeout in milliseconds.

On timeout the call site falls back to regex. Separate from model download time.

§hf_token: Option<String>

Resolved HuggingFace Hub API token.

Must be the token value (not a vault key name) — resolved by the caller before constructing ClassifiersConfig. When None, model downloads are unauthenticated, which fails for gated or private repos.

§scan_user_input: bool

When true, the ML injection classifier runs on direct user chat messages.

Default false: the DeBERTa model is intended for external/untrusted content (tool output, web scrapes) — not for direct user input. Enabling this may cause false positives on benign conversational messages.

§injection_model: String

HuggingFace repo ID for the injection detection model.

§enforcement_mode: InjectionEnforcementMode

Enforcement mode for the injection classifier.

warn (default): scores above injection_threshold emit WARN and increment metrics but do NOT block content. Use this when deploying classifiers on tool outputs — FPR of 12-37% on benign content makes hard-blocking unsafe.

block: scores above injection_threshold block content. Only safe for well-calibrated models or when FPR is verified on your workload.

§injection_threshold_soft: f32

Soft threshold: classifier score at or above this emits a WARN log and increments the suspicious-injection metric, but content is allowed through.

Range: (0.0, 1.0]. Default 0.5. Must be ≤ injection_threshold.

§injection_threshold: f32

Hard threshold: classifier score at or above this blocks the content (in block mode) or emits WARN (in warn mode).

Range: (0.0, 1.0]. Conservative default of 0.95 minimises false positives. Real-world ML injection classifiers have 12–37% recall gaps at high thresholds — defense-in-depth via regex fallback and spotlighting is mandatory.

§injection_model_sha256: Option<String>

Optional SHA-256 hex digest of the injection model safetensors file.

When set, the file is verified before loading. Mismatch aborts startup with an error. Useful for security-sensitive deployments to detect corruption or tampering.

§three_class_model: Option<String>

Optional HuggingFace repo ID or local path for the three-class AlignSentinel model.

When set, content flagged as Suspicious or Blocked by the binary DeBERTa classifier is passed to this model for refinement. If the three-class model classifies the content as aligned-instruction or no-instruction, the verdict is downgraded to Clean. This directly reduces false positives from legitimate instruction-style content.

§three_class_threshold: f32

Confidence threshold for the three-class model’s misaligned-instruction label.

Content is only kept as Suspicious/Blocked when the misaligned score meets this threshold. Range: (0.0, 1.0]. Default 0.7.

§three_class_model_sha256: Option<String>

Optional SHA-256 hex digest of the three-class model safetensors file.

§pii_enabled: bool

Enable PII detection via the NER model (pii_model).

When true, CandlePiiClassifier runs on user messages in addition to the regex-based PiiFilter. Both results are merged (union with deduplication).

§pii_model: String

HuggingFace repo ID for the PII NER model.

§pii_threshold: f32

Minimum per-token confidence to accept a PII label.

Tokens below this threshold are treated as O (no entity). Default 0.75 balances recall on rarer entity types (DRIVERLICENSE, PASSPORT, IBAN) with precision. Raise to 0.85 to prefer precision over recall.

§pii_model_sha256: Option<String>

Optional SHA-256 hex digest of the PII model safetensors file.

§pii_ner_max_chars: usize

Maximum number of bytes passed to the NER PII classifier per call.

Input is truncated at a valid UTF-8 boundary before classification to prevent timeout on large tool outputs (e.g. search_code). Default 8192.

§pii_ner_allowlist: Vec<String>

Allowlist of tokens that are never redacted by the NER PII classifier, regardless of model confidence.

Matching is case-insensitive and exact (whole span text must equal an allowlist entry). This suppresses common false positives from the piiranha model — for example, “Zeph” is misclassified as a city (PII:CITY) by the base model.

Default entries: ["Zeph", "Rust", "OpenAI", "Ollama", "Claude"]. Set to [] to disable the allowlist entirely.

§pii_ner_circuit_breaker: u32

Number of consecutive NER timeouts before the circuit breaker trips and disables NER for the remainder of the session.

When the breaker trips, all subsequent chunks fall back to regex-only PII detection, preventing repeated timeout stalls on paginated reads (e.g. 12 chunks × 30 s = 6 min). Set to 0 to disable the circuit breaker (NER is always attempted).

Default: 2. Takes effect on the next session start if changed mid-session.

Trait Implementations§

Source§

impl Clone for ClassifiersConfig

Source§

fn clone(&self) -> ClassifiersConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ClassifiersConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ClassifiersConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for ClassifiersConfig

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for ClassifiersConfig

Source§

fn eq(&self, other: &ClassifiersConfig) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for ClassifiersConfig

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl StructuralPartialEq for ClassifiersConfig

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DynClone for T
where T: Clone,

Source§

fn __clone_box(&self, _: Private) -> *mut ()

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> FromRef<T> for T
where T: Clone,

Source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,