# privacy-filter-rs
[](LICENSE)
[](https://www.rust-lang.org)
[](https://burn.dev)
Pure-Rust inference for [OpenAI Privacy Filter](https://huggingface.co/openai/privacy-filter) — a 1.5B-parameter bidirectional transformer with Sparse MoE for PII detection. Built on [Burn 0.20](https://burn.dev).
## Benchmark

| Python (transformers, CPU bf16) | 286 | ✓ | Baseline — bfloat16, half the FLOPs |
| Rust — wgpu-f16 / Metal GPU | 140 | ✗ | f16 precision kills MoE routing |
| **Rust — MLX / Apple Silicon f32** | **122** | **✓** | **Fastest correct Rust backend** |
| Rust — NdArray + Accelerate f32 | 91 | ✓ | CPU with Apple BLAS |
| Rust — NdArray plain f32 | 85 | ✓ | CPU, no BLAS |
| Rust — wgpu / Metal GPU f32 | 39 | ✓ | GPU transfer overhead on small batches |
*96 tokens across 6 samples, 5 iterations, Apple Silicon (M4 Mac mini). Python's lead is from bfloat16 (2x fewer FLOPs); both produce identical predictions on all test cases.*
## Quick Start
```bash
# Clone
git clone https://github.com/eugenehp/privacy-filter-rs
cd privacy-filter-rs
# Download weights (2.6 GB)
git clone https://huggingface.co/eugenehp/privacy-filter-rs data
# Run inference (MLX backend, recommended on Apple Silicon)
cargo run --release --no-default-features --features mlx -- \
-m data "My name is Alice Smith and my email is alice@example.com"
```
Output:
```json
[
{"entity_group": "private_person", "score": 0.999995, "word": " Alice Smith", "start": 10, "end": 22},
{"entity_group": "private_email", "score": 0.999999, "word": " alice@example.com", "start": 39, "end": 57}
]
```
## Usage as Library
```rust
use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}};
use std::path::Path;
let device = <Device as Default>::default();
let engine = PrivacyFilterInference::<B>::load(Path::new("data"), device)?;
let spans = engine.predict("My name is Alice Smith")?;
for s in &spans {
println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score);
}
// private_person: Alice Smith (score: 1.0000)
```
## Building
```bash
# CPU (default) — portable, no GPU required
cargo build --release
# CPU + Apple Accelerate BLAS (macOS, faster matmul)
cargo build --release --features blas-accelerate
# MLX — Apple Silicon GPU (fastest correct backend)
cargo build --release --no-default-features --features mlx
# wgpu — Metal/Vulkan GPU (f32)
cargo build --release --no-default-features --features wgpu
```
## Features
| `ndarray` (default) | NdArray | CPU | Portable, multi-threaded |
| `blas-accelerate` | NdArray + Accelerate | CPU | macOS only, faster matmul |
| `openblas-system` | NdArray + OpenBLAS | CPU | Linux |
| `mlx` | burn-mlx | Apple Silicon | Unified memory, no copy overhead |
| `wgpu` | wgpu | Metal/Vulkan | GPU f32 |
| `wgpu-f16` | wgpu | Metal/Vulkan | GPU f16 — fast but wrong results |
## Architecture
The model is a bidirectional transformer encoder with:
- **Token embedding**: 200K vocab (o200k_base) to 640-dim
- **8 transformer layers**, each with:
- RMSNorm, Grouped Query Attention (14Q / 2KV heads, sliding window 257, YaRN RoPE, attention sinks)
- RMSNorm, Sparse MoE (128 experts, top-4 routing, custom GELU gating)
- **Classification head**: 640 to 33 BIOES labels
- **Viterbi decoder**: constrained BIOES transitions, tunable operating points
Detects 8 PII categories: `account_number`, `private_address`, `private_date`, `private_email`, `private_person`, `private_phone`, `private_url`, `secret`.
## Tests
```bash
# Run all 14 tests (requires weights in ./data)
cargo test --release -- --test-threads=1
# With MLX backend
cargo test --release --no-default-features --features mlx -- --test-threads=1
```
Tests verify:
- Tokenization IDs match Python reference
- Argmax labels identical to HuggingFace transformers on 6 inputs
- Span extraction produces correct entity groups and text
- High confidence (>0.95) on clear PII
- No false positives on clean text
- Byte offsets are valid
- Viterbi decoder enforces BIOES constraints
- Config and label parsing
## Benchmarking
```bash
# Run Rust benchmark
cargo run --example bench --release --no-default-features --features mlx -- -m data
# Run Python baseline
python3 bench.py
# Generate chart
python3 bench_chart.py
```
## CLI
```bash
# Detect spans (default JSON output)
privacy-filter -m data "My name is Alice Smith"
# Per-token labels
privacy-filter -m data -f labels "My name is Alice Smith"
# Raw logits
privacy-filter -m data -f logits "My name is Alice Smith"
# Read from stdin
## License
Apache 2.0 — same as the upstream [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) model.