privacy-filter-rs
Pure-Rust inference for OpenAI Privacy Filter — a 1.5B-parameter bidirectional transformer with Sparse MoE for PII detection. Built on Burn 0.20.
Benchmark

| Backend | tok/sec | Correct | Notes |
|---|---|---|---|
| Python (transformers, CPU bf16) | 286 | ✓ | Baseline — bfloat16, half the FLOPs |
| Rust — wgpu-f16 / Metal GPU | 140 | ✗ | f16 precision kills MoE routing |
| Rust — MLX / Apple Silicon f32 | 122 | ✓ | Fastest correct Rust backend |
| Rust — NdArray + Accelerate f32 | 91 | ✓ | CPU with Apple BLAS |
| Rust — NdArray plain f32 | 85 | ✓ | CPU, no BLAS |
| Rust — wgpu / Metal GPU f32 | 39 | ✓ | GPU transfer overhead on small batches |
96 tokens across 6 samples, 5 iterations, Apple Silicon (M4 Mac mini). Python's lead is from bfloat16 (2x fewer FLOPs); both produce identical predictions on all test cases.
Quick Start
# Clone
# Download weights (2.6 GB)
# Run inference (MLX backend, recommended on Apple Silicon)
Output:
Usage as Library
use ;
use Path;
let device = default;
let engine = load?;
let spans = engine.predict?;
for s in &spans
// private_person: Alice Smith (score: 1.0000)
Building
# CPU (default) — portable, no GPU required
# CPU + Apple Accelerate BLAS (macOS, faster matmul)
# MLX — Apple Silicon GPU (fastest correct backend)
# wgpu — Metal/Vulkan GPU (f32)
Features
| Feature | Backend | Device | Notes |
|---|---|---|---|
ndarray (default) |
NdArray | CPU | Portable, multi-threaded |
blas-accelerate |
NdArray + Accelerate | CPU | macOS only, faster matmul |
openblas-system |
NdArray + OpenBLAS | CPU | Linux |
mlx |
burn-mlx | Apple Silicon | Unified memory, no copy overhead |
wgpu |
wgpu | Metal/Vulkan | GPU f32 |
wgpu-f16 |
wgpu | Metal/Vulkan | GPU f16 — fast but wrong results |
Architecture
The model is a bidirectional transformer encoder with:
- Token embedding: 200K vocab (o200k_base) to 640-dim
- 8 transformer layers, each with:
- RMSNorm, Grouped Query Attention (14Q / 2KV heads, sliding window 257, YaRN RoPE, attention sinks)
- RMSNorm, Sparse MoE (128 experts, top-4 routing, custom GELU gating)
- Classification head: 640 to 33 BIOES labels
- Viterbi decoder: constrained BIOES transitions, tunable operating points
Detects 8 PII categories: account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret.
Tests
# Run all 14 tests (requires weights in ./data)
# With MLX backend
Tests verify:
- Tokenization IDs match Python reference
- Argmax labels identical to HuggingFace transformers on 6 inputs
- Span extraction produces correct entity groups and text
- High confidence (>0.95) on clear PII
- No false positives on clean text
- Byte offsets are valid
- Viterbi decoder enforces BIOES constraints
- Config and label parsing
Benchmarking
# Run Rust benchmark
# Run Python baseline
# Generate chart
CLI
# Detect spans (default JSON output)
# Per-token labels
# Raw logits
# Read from stdin
|
License
Apache 2.0 — same as the upstream openai/privacy-filter model.