pii 0.1.0

PII detection and anonymization with deterministic, capability-aware NLP pipelines.
Documentation
//! Capability flags produced by an NLP engine.
//!
//! Capabilities describe which linguistic artifacts are available and should
//! be trusted for the current analysis. They are crucial for controlled
//! degradation: recognizers and enhancers can adapt their behavior when lemma,
//! POS, or NER is missing rather than failing or guessing.
//!
//! For example:
//! - If `lemma` is false, context enhancement should fall back to surface tokens.
//! - If `ner` is false, NER-backed recognizers must return no detections.
//! - If `sentences` is false, sentence-based windows fall back to token windows.

use serde::{Deserialize, Serialize};

/// Flags describing NLP artifacts available for a given analysis.
#[derive(Clone, Debug, Default, Serialize, Deserialize)]
pub struct Capabilities {
    /// True if token offsets are stable and aligned to the input text.
    pub token_offsets: bool,
    /// True if lemma forms are available.
    pub lemma: bool,
    /// True if part-of-speech tags are available.
    pub pos: bool,
    /// True if named-entity spans are available.
    pub ner: bool,
    /// True if sentence segmentation is available.
    pub sentences: bool,
}

impl Capabilities {
    /// Baseline capabilities for simple tokenization and offsets.
    pub fn basic() -> Self {
        Self {
            token_offsets: true,
            lemma: false,
            pos: false,
            ner: false,
            sentences: false,
        }
    }
}