Expand description
Text anonymization library backed by a built-in multilingual NER model.
tiktag loads one bundled profile and exposes a small library surface:
construct Tiktag once, then call Tiktag::anonymize for each input.
The built-in pipeline uses the Xenova
distilbert-base-multilingual-cased-ner-hrl model for person, org, and
location entities, then applies additive regex recognizers such as email.
§Example
use std::path::Path;
use tiktag::Tiktag;
let mut tiktag = Tiktag::new(Path::new("models/profiles.toml"))?;
let out = tiktag.anonymize("Maria Garcia from OpenAI visited Berlin.")?;
println!("{}", out.anonymization.anonymized_text);§Notes
Tiktag::newis the expensive step: it loads the profile, tokenizer, and ONNX runtime session.Tiktag::anonymizetakes&mut selfand reuses that loaded runtime.- Placeholder numbering is stable within one call only.
- Model-based anonymization can miss entities; treat
tiktagas an assistive control, not a sole safety boundary.
Structs§
- Anonymization
Result - Complete anonymization payload for one input document.
- Profiles
- Parsed built-in profile configuration before resolution.
- Replacement
- One accepted replacement span in the original text.
- Resolved
Profile - The built-in model config after validation and path resolution.
- Tiktag
- Reusable anonymizer instance backed by one loaded profile and ONNX session.
- Tiktag
Output - Result of one
Tiktag::anonymizecall.
Enums§
- Placeholder
Family - Placeholder family written into anonymized output.
- Tiktag
Error - Public error type for the tiktag library surface.
Constants§
- BUILTIN_
PROFILE_ NAME - Logical name of the built-in bundled profile.
- REQUIRED_
MODEL_ FILES - Relative file paths required in a valid model bundle directory.
Functions§
- missing_
model_ files - Returns required bundle files that are missing from
model_dir. - validate_
model_ bundle - Validates that
model_dircontains the files required by the built-in bundle layout.
Type Aliases§
- Tiktag
Result - Library-wide
Resultalias that defaults toTiktagError.