privacy-filter-rs 0.1.0

OpenAI Privacy Filter — PII detection inference in pure Rust with Burn ML
Documentation

privacy-filter-rs

License Rust Burn

Pure-Rust inference for OpenAI Privacy Filter — a 1.5B-parameter bidirectional transformer with Sparse MoE for PII detection. Built on Burn 0.20.

Benchmark

Benchmark

Backend tok/sec Correct Notes
Python (transformers, CPU bf16) 286 Baseline — bfloat16, half the FLOPs
Rust — wgpu-f16 / Metal GPU 140 f16 precision kills MoE routing
Rust — MLX / Apple Silicon f32 122 Fastest correct Rust backend
Rust — NdArray + Accelerate f32 91 CPU with Apple BLAS
Rust — NdArray plain f32 85 CPU, no BLAS
Rust — wgpu / Metal GPU f32 39 GPU transfer overhead on small batches

96 tokens across 6 samples, 5 iterations, Apple Silicon (M4 Mac mini). Python's lead is from bfloat16 (2x fewer FLOPs); both produce identical predictions on all test cases.

Quick Start

# Clone
git clone https://github.com/eugenehp/privacy-filter-rs
cd privacy-filter-rs

# Download weights (2.6 GB)
git clone https://huggingface.co/eugenehp/privacy-filter-rs data

# Run inference (MLX backend, recommended on Apple Silicon)
cargo run --release --no-default-features --features mlx -- \
  -m data "My name is Alice Smith and my email is alice@example.com"

Output:

[
  {"entity_group": "private_person", "score": 0.999995, "word": " Alice Smith", "start": 10, "end": 22},
  {"entity_group": "private_email", "score": 0.999999, "word": " alice@example.com", "start": 39, "end": 57}
]

Usage as Library

use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}};
use std::path::Path;

let device = <Device as Default>::default();
let engine = PrivacyFilterInference::<B>::load(Path::new("data"), device)?;

let spans = engine.predict("My name is Alice Smith")?;
for s in &spans {
    println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score);
}
// private_person:  Alice Smith (score: 1.0000)

Building

# CPU (default) — portable, no GPU required
cargo build --release

# CPU + Apple Accelerate BLAS (macOS, faster matmul)
cargo build --release --features blas-accelerate

# MLX — Apple Silicon GPU (fastest correct backend)
cargo build --release --no-default-features --features mlx

# wgpu — Metal/Vulkan GPU (f32)
cargo build --release --no-default-features --features wgpu

Features

Feature Backend Device Notes
ndarray (default) NdArray CPU Portable, multi-threaded
blas-accelerate NdArray + Accelerate CPU macOS only, faster matmul
openblas-system NdArray + OpenBLAS CPU Linux
mlx burn-mlx Apple Silicon Unified memory, no copy overhead
wgpu wgpu Metal/Vulkan GPU f32
wgpu-f16 wgpu Metal/Vulkan GPU f16 — fast but wrong results

Architecture

The model is a bidirectional transformer encoder with:

  • Token embedding: 200K vocab (o200k_base) to 640-dim
  • 8 transformer layers, each with:
    • RMSNorm, Grouped Query Attention (14Q / 2KV heads, sliding window 257, YaRN RoPE, attention sinks)
    • RMSNorm, Sparse MoE (128 experts, top-4 routing, custom GELU gating)
  • Classification head: 640 to 33 BIOES labels
  • Viterbi decoder: constrained BIOES transitions, tunable operating points

Detects 8 PII categories: account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret.

Tests

# Run all 14 tests (requires weights in ./data)
cargo test --release -- --test-threads=1

# With MLX backend
cargo test --release --no-default-features --features mlx -- --test-threads=1

Tests verify:

  • Tokenization IDs match Python reference
  • Argmax labels identical to HuggingFace transformers on 6 inputs
  • Span extraction produces correct entity groups and text
  • High confidence (>0.95) on clear PII
  • No false positives on clean text
  • Byte offsets are valid
  • Viterbi decoder enforces BIOES constraints
  • Config and label parsing

Benchmarking

# Run Rust benchmark
cargo run --example bench --release --no-default-features --features mlx -- -m data

# Run Python baseline
python3 bench.py

# Generate chart
python3 bench_chart.py

CLI

# Detect spans (default JSON output)
privacy-filter -m data "My name is Alice Smith"

# Per-token labels
privacy-filter -m data -f labels "My name is Alice Smith"

# Raw logits
privacy-filter -m data -f logits "My name is Alice Smith"

# Read from stdin
echo "Call me at 555-0123" | privacy-filter -m data

License

Apache 2.0 — same as the upstream openai/privacy-filter model.