privacy-filter-rs

Pure-Rust inference for OpenAI Privacy Filter — a 1.5B-parameter bidirectional transformer with Sparse MoE for PII detection. Built on Burn 0.20.

Benchmark

Backend	tok/sec	Correct	Notes
Python (transformers, CPU bf16)	286	✓	Baseline — bfloat16, half the FLOPs
Rust — wgpu-f16 / Metal GPU	140	✗	f16 precision kills MoE routing
Rust — MLX / Apple Silicon f32	122	✓	Fastest correct Rust backend
Rust — NdArray + Accelerate f32	91	✓	CPU with Apple BLAS
Rust — NdArray plain f32	85	✓	CPU, no BLAS
Rust — wgpu / Metal GPU f32	39	✓	GPU transfer overhead on small batches

96 tokens across 6 samples, 5 iterations, Apple Silicon (M4 Mac mini). Python's lead is from bfloat16 (2x fewer FLOPs); both produce identical predictions on all test cases.

Quick Start

# Clone
git clone https://github.com/eugenehp/privacy-filter-rs
cd privacy-filter-rs

# Download weights (2.6 GB)
git clone https://huggingface.co/eugenehp/privacy-filter-rs data

# Run inference (MLX backend, recommended on Apple Silicon)
cargo run --release --no-default-features --features mlx -- \
  -m data "My name is Alice Smith and my email is alice@example.com"

Output:

[
  {"entity_group": "private_person", "score": 0.999995, "word": " Alice Smith", "start": 10, "end": 22},
  {"entity_group": "private_email", "score": 0.999999, "word": " alice@example.com", "start": 39, "end": 57}
]

Usage as Library

use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}};
use std::path::Path;

let device = <Device as Default>::default();
let engine = PrivacyFilterInference::<B>::load(Path::new("data"), device)?;

let spans = engine.predict("My name is Alice Smith")?;
for s in &spans {
    println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score);
}
// private_person:  Alice Smith (score: 1.0000)

Building

# CPU (default) — portable, no GPU required
cargo build --release

# CPU + Apple Accelerate BLAS (macOS, faster matmul)
cargo build --release --features blas-accelerate

# MLX — Apple Silicon GPU (fastest correct backend)
cargo build --release --no-default-features --features mlx

# wgpu — Metal/Vulkan GPU (f32)
cargo build --release --no-default-features --features wgpu

Features

Feature	Backend	Device	Notes
`ndarray` (default)	NdArray	CPU	Portable, multi-threaded
`blas-accelerate`	NdArray + Accelerate	CPU	macOS only, faster matmul
`openblas-system`	NdArray + OpenBLAS	CPU	Linux
`mlx`	burn-mlx	Apple Silicon	Unified memory, no copy overhead
`wgpu`	wgpu	Metal/Vulkan	GPU f32
`wgpu-f16`	wgpu	Metal/Vulkan	GPU f16 — fast but wrong results

Architecture

The model is a bidirectional transformer encoder with:

Token embedding: 200K vocab (o200k_base) to 640-dim
8 transformer layers, each with:
- RMSNorm, Grouped Query Attention (14Q / 2KV heads, sliding window 257, YaRN RoPE, attention sinks)
- RMSNorm, Sparse MoE (128 experts, top-4 routing, custom GELU gating)
Classification head: 640 to 33 BIOES labels
Viterbi decoder: constrained BIOES transitions, tunable operating points

Detects 8 PII categories: account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret.

Tests

# Run all 14 tests (requires weights in ./data)
cargo test --release -- --test-threads=1

# With MLX backend
cargo test --release --no-default-features --features mlx -- --test-threads=1

Tests verify:

Tokenization IDs match Python reference
Argmax labels identical to HuggingFace transformers on 6 inputs
Span extraction produces correct entity groups and text
High confidence (>0.95) on clear PII
No false positives on clean text
Byte offsets are valid
Viterbi decoder enforces BIOES constraints
Config and label parsing

Benchmarking

# Run Rust benchmark
cargo run --example bench --release --no-default-features --features mlx -- -m data

# Run Python baseline
python3 bench.py

# Generate chart
python3 bench_chart.py

CLI

# Detect spans (default JSON output)
privacy-filter -m data "My name is Alice Smith"

# Per-token labels
privacy-filter -m data -f labels "My name is Alice Smith"

# Raw logits
privacy-filter -m data -f logits "My name is Alice Smith"

# Read from stdin
echo "Call me at 555-0123" | privacy-filter -m data

License

Apache 2.0 — same as the upstream openai/privacy-filter model.

privacy-filter-rs 0.1.0