Skip to main content

Crate tiktag

Crate tiktag 

Source
Expand description

Text anonymization library backed by a built-in multilingual NER model.

tiktag loads one bundled profile and exposes a small library surface: construct Tiktag once, then call Tiktag::anonymize for each input. The built-in pipeline uses the Xenova distilbert-base-multilingual-cased-ner-hrl model for person, org, and location entities, then applies additive regex recognizers such as email.

§Example

use std::path::Path;

use tiktag::Tiktag;

let mut tiktag = Tiktag::new(Path::new("models/profiles.toml"))?;
let out = tiktag.anonymize("Maria Garcia from OpenAI visited Berlin.")?;
println!("{}", out.anonymization.anonymized_text);

§Notes

  • Tiktag::new is the expensive step: it loads the profile, tokenizer, and ONNX runtime session.
  • Tiktag::anonymize takes &mut self and reuses that loaded runtime.
  • Placeholder numbering is stable within one call only.
  • Model-based anonymization can miss entities; treat tiktag as an assistive control, not a sole safety boundary.

Structs§

AnonymizationResult
Complete anonymization payload for one input document.
Profiles
Parsed built-in profile configuration before resolution.
Replacement
One accepted replacement span in the original text.
ResolvedProfile
The built-in model config after validation and path resolution.
Tiktag
Reusable anonymizer instance backed by one loaded profile and ONNX session.
TiktagOutput
Result of one Tiktag::anonymize call.

Enums§

PlaceholderFamily
Placeholder family written into anonymized output.
TiktagError
Public error type for the tiktag library surface.

Constants§

BUILTIN_PROFILE_NAME
Logical name of the built-in bundled profile.
REQUIRED_MODEL_FILES
Relative file paths required in a valid model bundle directory.

Functions§

missing_model_files
Returns required bundle files that are missing from model_dir.
validate_model_bundle
Validates that model_dir contains the files required by the built-in bundle layout.

Type Aliases§

TiktagResult
Library-wide Result alias that defaults to TiktagError.