pii-vault 0.2.0

Presidio-compatible PII detection, anonymization, and reversible tokenization
Documentation

PII Vault

Presidio-compatible PII detection, anonymization, and reversible tokenization.

Crates.io npm PyPI License: MIT

Multi-language implementations sharing a common specification. Detect 40+ PII entity types, anonymize with multiple strategies (replace, mask, hash, redact), and reversibly tokenize with a persistent vault.

Install

# Rust
cargo add pii-vault

# TypeScript / JavaScript
npm install pii-vault

# Python
pip install pii-vault

# Go
go get github.com/Jiansen/pii-vault/go

Features

  • 29 built-in recognizers covering 15 countries (US, UK, CN, IN, AU, DE, IT, ES, KR, SG, FI, SE, PL, JP, FR, CA, BR)
  • Presidio-aligned regex patterns for core entity types (email, credit card, IP, crypto)
  • Shared spec: Recognizer patterns defined as JSON, consumed by all language implementations
  • Vault: Deterministic, reversible tokenization with collision handling and context disambiguation
  • Multiple anonymization strategies: Replace, Mask, Hash, Redact, Vault
  • Luhn validation for credit cards, checksum validation for Chinese ID cards
  • Context-aware scoring: Boost detection confidence when context words appear nearby
  • Zero runtime dependencies beyond regex and JSON parsing

Quick Start

Rust

[dependencies]
pii-vault = "0.1"
use pii_vault::{Analyzer, Anonymizer, Operator, Vault, load_recognizers_from_dir};
use std::collections::HashMap;
use std::path::Path;

// Load recognizers from spec/
let recognizers = load_recognizers_from_dir(Path::new("spec/recognizers"));
let analyzer = Analyzer::new(recognizers);

// Analyze text
let text = "Email alice@company.com, SSN 123-45-6789";
let result = analyzer.analyze(text, &[], 0.0);

// Anonymize with vault (reversible)
let mut vault = Vault::new();
let mut ops = HashMap::new();
ops.insert("EMAIL_ADDRESS".to_string(), Operator::Vault);
ops.insert("US_SSN".to_string(), Operator::Vault);

let anon = Anonymizer::anonymize(text, &result.entities, &ops, &Operator::default(), Some(&mut vault));
println!("{}", anon.text);
// "Email [EMAIL_ADDRESS:a1b2], SSN [US_SSN:c3d4]"

// Restore original
let restored = vault.detokenize(&anon.text);
assert_eq!(restored, text);

TypeScript

npm install pii-vault
import { Analyzer, Anonymizer, RegexRecognizer, Vault } from 'pii-vault';
import * as fs from 'fs';

// Load recognizers from spec/
const specDir = './spec/recognizers';
const recognizers = fs.readdirSync(specDir)
  .filter(f => f.endsWith('.json'))
  .map(f => new RegexRecognizer(JSON.parse(fs.readFileSync(`${specDir}/${f}`, 'utf-8'))));

const analyzer = new Analyzer(recognizers);

// Analyze
const text = 'Email alice@company.com, SSN 123-45-6789';
const result = analyzer.analyze(text);

// Anonymize with vault
const vault = new Vault();
const ops = { EMAIL_ADDRESS: { type: 'vault' }, US_SSN: { type: 'vault' } };
const anon = Anonymizer.anonymize(text, result.entities, ops, { type: 'replace' }, vault);

// Restore
const restored = vault.detokenize(anon.text);

Architecture

pii-vault/
├── spec/                     # Shared specification (language-agnostic)
│   ├── entities.json         # 45 entity type definitions
│   ├── recognizers/          # 29 regex recognizer definitions (JSON)
│   └── test-cases/           # Cross-language test cases
├── rust/                     # Rust implementation → crates.io: pii-vault
├── typescript/               # TypeScript implementation → npm: pii-vault
├── go/                       # Go implementation (planned)
├── java/                     # Java implementation (planned)
├── haskell/                  # Haskell implementation (planned)
└── wasm/                     # WASM from Rust (planned)

The spec/recognizers/*.json files are the single source of truth. All language implementations load these patterns at runtime or compile time.

Supported Entity Types

Generic (all languages)

EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, CRYPTO, IP_ADDRESS, MAC_ADDRESS, IBAN_CODE, URL, UUID

Country-Specific

Country Entities
US SSN, ITIN, Passport, Driver License, Bank Routing
UK NHS, NINO
China ID Card (18-digit), Phone, Passport, Bank Card
India Aadhaar, PAN, Passport
Australia TFN, Medicare, ABN
Germany Steuer-ID
Italy Fiscal Code
Spain NIE, NIF
Korea RRN
Singapore NRIC
Finland Personal ID
Sweden Personal Number
Poland PESEL
Japan My Number, Passport
France NIR
Canada SIN
Brazil CPF

Anonymization Strategies

Strategy Description Reversible
Replace Replace with <ENTITY_TYPE> or custom string No
Mask Partially mask characters (e.g., ****1111) No
Hash FNV hash of original value No
Redact Remove entirely No
Vault Deterministic token [ENTITY:xxxx] with persistent mapping Yes

Contributing

Add a new recognizer:

  1. Create spec/recognizers/your_entity.json following the existing format
  2. Add test cases to spec/test-cases/
  3. Run tests in both Rust and TypeScript to verify

License

MIT