High-performance PII detection and anonymization engine
A production-ready, Rust-based solution designed as a drop-in replacement for Microsoft Presidio.
Quick Start · Documentation · Examples · Contributing
Features
- High Performance — 10-100x faster than Python-based solutions with sub-millisecond inference
- Memory Safe — Rust's borrow checker eliminates entire classes of security vulnerabilities
- Production Ready — 36 pattern-based entity types with validation, plus transformer-based NER
- Multi-Platform — Native server and CLI support
- ML-Powered — Full ONNX Runtime integration for transformer models (BERT, RoBERTa, DistilBERT)
- Lightweight — ~20-50MB memory footprint vs ~300MB for Presidio
- Extensible — Plugin architecture for custom recognizers and anonymization strategies
Quick Start
Install the CLI
Analyze Text for PII
Output:
Detected 2 PII entities:
EmailAddress at 21..37 (score: 0.80): john@example.com
PhoneNumber at 46..60 (score: 0.70): (555) 123-4567
Processing time: 2ms
Anonymize PII
# Replace with placeholders (default)
# Output: My SSN is [US_SSN]
# Mask sensitive data
# Output: Email: jo**@****le.com
# Hash for consistent pseudonymization
# Output: Card: [CREDIT_CARD_a1b2c3d4]
Process Files
# Analyze a file
# Pipe from stdin
|
# Output as JSON
Filter by Entity Type
# Only detects EmailAddress and UsSsn, ignores PhoneNumber
Installation
Using Cargo (Recommended)
From Source
Using Docker
Multi-architecture images available for linux/amd64 and linux/arm64:
The image uses a minimal distroless base (~37MB) optimized for ARM64 (AWS Graviton, Apple Silicon) and AMD64.
Full image (pattern + ONNX NER)
To enable all entities including ONNX NER (PERSON, ORGANIZATION, LOCATION, DATE_TIME), use the full image. It is published on every release to GHCR with tags full, X.Y.Z-full, etc.:
To build locally instead:
The full image uses a pre-built NER base layer (NER_BASE, default ghcr.io/censgate/redact-ner-base:v2). Override with --build-arg NER_BASE=... only if you publish a different tag.
The full image bakes in a pre-exported NER model (dslim/bert-base-NER) and sets NER_MODEL_PATH=/app/model/model.onnx, so NER is enabled at startup. To enable NER with the default image, mount a directory containing model.onnx and tokenizer.json and set:
Rust Version
This project requires Rust 1.93.0. Use Mise or ASDF for version management:
# Using Mise (recommended)
# Using ASDF
# Using rustup
Library Usage
Add to your Cargo.toml:
[]
= "0.8.2"
= "0.8.2" # Optional: for ML-based NER
Basic Pattern Detection
use ;
ML-Powered NER
For detecting contextual entities like person names, organizations, and locations:
use AnalyzerEngine;
use ;
use Arc;
REST API
Start the Server
# Server listening on http://0.0.0.0:8080
Analyze Endpoint
Response:
Anonymize Endpoint
Supported Entity Types
Pattern-Based (36 types)
| Category | Entity Types |
|---|---|
| Contact | EMAIL_ADDRESS, PHONE_NUMBER, IP_ADDRESS, URL, DOMAIN_NAME |
| Financial | CREDIT_CARD, IBAN_CODE, US_BANK_NUMBER |
| US | US_SSN, US_DRIVER_LICENSE, US_PASSPORT, US_ZIP_CODE |
| UK | UK_NHS, UK_NINO, UK_POSTCODE, UK_PHONE_NUMBER, UK_MOBILE_NUMBER, UK_SORT_CODE, UK_DRIVER_LICENSE, UK_PASSPORT_NUMBER, UK_COMPANY_NUMBER |
| Healthcare | MEDICAL_LICENSE, MEDICAL_RECORD_NUMBER |
| Crypto | CRYPTO_WALLET, BTC_ADDRESS, ETH_ADDRESS |
| Technical | GUID, MAC_ADDRESS, MD5_HASH, SHA1_HASH, SHA256_HASH |
| Generic | PASSPORT_NUMBER, AGE, ISBN, PO_BOX, DATE_TIME |
Pattern-based detection includes validation (Luhn for credit cards, mod-11 for NHS, IBAN checksums) to reduce false positives.
NER-Based (ML-Powered)
| Entity Type | Description |
|---|---|
PERSON |
Person names (e.g., "John Doe", "Marie Curie") |
ORGANIZATION |
Organization names (e.g., "Acme Corp", "Microsoft") |
LOCATION |
Location names (e.g., "New York", "London") |
DATE_TIME |
Date/time expressions in context |
Requires ONNX model. See ML-Powered NER section.
Anonymization Strategies
| Strategy | Description | Example |
|---|---|---|
| Replace | Simple placeholder | [EMAIL_ADDRESS] |
| Mask | Partial masking | jo**@****le.com |
| Hash | Irreversible hashing | [EMAIL_ADDRESS_a1b2c3d4] |
| Encrypt | Reversible encryption | <TOKEN_uuid> |
use ;
let config = AnonymizerConfig ;
// "john@example.com" → "jo**@****le.com"
ML-Powered NER
Redact includes full ONNX Runtime integration for transformer-based Named Entity Recognition.
Setup
1. Export a HuggingFace model to ONNX:
2. Use in your code:
use ;
use AnalyzerEngine;
use Arc;
let config = NerConfig ;
let ner = from_config?;
let mut engine = new;
engine.recognizer_registry_mut.add_recognizer;
Model Directory Structure
The export script creates a directory with the following files:
models/bert-base-ner/
├── model.onnx # ONNX model file (REQUIRED)
├── tokenizer.json # HuggingFace tokenizer (REQUIRED)
├── config.json # Model config with label mappings
├── special_tokens_map.json
└── tokenizer_config.json
Required files for inference:
model.onnx- The ONNX-exported transformer modeltokenizer.json- HuggingFace fast tokenizer (must be in same directory as model, or specify viatokenizer_path)
Recommended Models
| Model | Size | Use Case |
|---|---|---|
dslim/bert-base-NER |
~420MB | Best accuracy/size balance (default) |
dbmdz/bert-large-cased-finetuned-conll03-english |
~1.2GB | Highest accuracy |
Davlan/distilbert-base-multilingual-cased-ner-hrl |
~500MB | Multilingual support |
elastic/distilbert-base-cased-finetuned-conll03-english |
~250MB | Smaller/faster |
All models must be trained on CoNLL-2003 or similar NER datasets with BIO tagging scheme (B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC labels).
Performance
- Inference: ~2-10ms per text (depending on model and text length)
- Memory: ~50-200MB (depending on model)
- Startup: ~100-500ms model load time
- Concurrency: Thread-safe via mutex-wrapped sessions
Performance
Benchmark Results (2026-04-18)
Measured using oha with both services running in Docker containers. See docs/benchmarks/results-20260418-175909.md.
| Metric | Redact (Rust) | Presidio (Python) | Speedup |
|---|---|---|---|
| p50 Latency | 0.196 ms | 6.25 ms | 32x |
| p99 Latency | 1.90 ms | 21.68 ms | 11x |
| Throughput | 19,416 req/s | 170 req/s | 114x |
Test payload: Contact john.doe@example.com or call (555) 123-4567. SSN: 123-45-6789.
Run Benchmarks
# REST API comparison vs Presidio (requires Docker; oha on PATH or auto-downloaded)
# Criterion micro-benchmarks (Redact internals)
See docs/benchmarks/ for methodology and detailed results.
Project Structure
redact/
├── crates/
│ ├── redact-core/ # Core detection & anonymization engine
│ ├── redact-ner/ # ONNX NER integration
│ ├── redact-api/ # REST API service (Axum)
│ ├── redact-cli/ # Command-line tool
│ └── redact-wasm/ # WebAssembly bindings
├── patterns/ # PII detection patterns (GDPR, HIPAA, CCPA)
├── scripts/ # Utility scripts (model export)
├── examples/ # Usage examples
└── docs/ # Documentation
Testing
# Run all tests
# Run with output
# Run benchmarks
# Run NER E2E tests (requires ONNX model)
# Run specific test suites
See TEST_COVERAGE.md for detailed coverage report.
Documentation
- API Documentation — Rust API docs
- Test Coverage — Testing details
- Contributing Guide — How to contribute
- Examples — Code examples
Roadmap
Pre-1.0.0
v0.8.2 (Current)
- Complete Rust rewrite (replacing Go v0.1.0-v0.4.1)
- 36 pattern-based entity types with checksum validation
- Full ONNX NER integration (PERSON, ORGANIZATION, LOCATION)
- 4 anonymization strategies (replace, mask, hash, encrypt)
- REST API service
- CLI tool
- Multi-arch Docker images (AMD64/ARM64)
- Full Docker image with embedded NER model (
ghcr.io/censgate/redact:full) - Comprehensive test suite (~75% coverage)
- Entity overlap resolution with specificity scoring
v0.9.0 (Planned)
- Publish crates to crates.io
- WebAssembly (WASM) browser support
- Streaming API for large texts
- Enhanced documentation
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Fork and clone
# Create a feature branch
# Make changes and test
# Commit and push
License
Censgate Redact is licensed under the Apache License 2.0.
See the LICENSE file for the complete license terms.
Copyright (c) 2026 Censgate LLC
Acknowledgments
- Inspired by Microsoft Presidio
- Built with ONNX Runtime
- Powered by Rust
- ML models from HuggingFace
Support
- GitHub Issues — Bug reports and feature requests
- GitHub Discussions — Questions and general discussion
- Email: support@censgate.com
Star us on GitHub if you find this project useful!