ScrubbeRS
ScrubbeRS is a Rust-first, zero-copy (in-place) redaction engine with:
- A stdin → stdout CLI for shell pipelines.
- A Rust library API.
- Optional Python and Node.js bindings.
- Built-in high-confidence detector signatures for direct redaction.
- Optional
.scrubsignature files for custom org-specific patterns.
Why this is fast
- Redaction happens in place (
&mut [u8]) with byte-mask filling. - Literal signatures are matched with Aho-Corasick (single-pass multi-pattern automaton).
- Regex signatures use compiled
regex::bytes::Regexand run against raw bytes. - Release profile is tuned (
lto=fat,codegen-units=1,panic=abort).
CLI usage
# Build release binary
# Pipe mode (stdin -> stdout)
|
# With custom signatures
|
# Custom mask byte
|
# Line-oriented streaming mode for log pipelines
|
.scrub format
Each non-empty, non-comment line is either:
name=regex_or_literalregex_or_literal(auto-named)
Example:
# redact internal session tokens
session_token=sess_[A-Za-z0-9]{32}
# redact literal phrase
MY_INTERNAL_SECRET_PREFIX
Rust API
use Cursor;
use Scrubber;
let scrubber = new?;
let mut bytes = b"ghp_123456789012345678901234567890123456".to_vec;
scrubber.scrub_in_place;
let mut output = Vecnew;
scrubber.scrub_lines?;
Python bindings
Install from PyPI once published:
Build Python distributions locally with uv:
Exposed functions:
scrubbers.scrub_bytes(data: bytes) -> bytesscrubbers.scrub_text(data: str) -> strscrubbers.scrub_lines_bytes(data: bytes) -> bytesscrubbers.scrub_lines_text(data: str) -> str
The scrub_lines_* helpers apply the library's newline-delimited streaming path over the provided input.
Example:
# "prefix **************************************** suffix"
# "safe\nprefix **************************************** suffix\n"
Smoke test the built wheel and sdist locally:
You can still exercise the raw extension crate directly with:
Node.js bindings
Build the Node extension crate:
Exposed functions:
scrubBuffer(buf: Buffer) -> BufferscrubLinesBuffer(buf: Buffer) -> Buffer
scrubLinesBuffer(...) applies the library's newline-delimited streaming path over the provided buffer.
Example:
const = require;
// <Buffer 70 72 65 66 69 78 20 2a ...>
.;
// "safe\nprefix **************************************** suffix\n"
Run binding smoke tests locally:
Verify the publishable Python package locally:
Benchmark the Python binding in a logging-style path:
Publishing
Release publishing is tag-driven through publish.yml:
That workflow:
- builds and smoke tests Python wheels on Linux, macOS, and Windows with
uv build - builds and smoke tests a source distribution
- publishes Python distributions to PyPI with Trusted Publishing
- verifies and publishes the
scrubberscrate to crates.io
Local preflight checks:
Before the release workflow can publish, configure trusted publishers on both registries:
- PyPI: add the GitHub repository/workflow as a trusted publisher for the
scrubbersproject and create thepypienvironment. - crates.io: publish the crate manually once, then add this repository/workflow as a trusted publisher for
scrubbersand create thecrates-ioenvironment.
TruffleHog parity workflow
TruffleHog detector coverage is tracked in src/generated_trufflehog.rs:
CI runs these commands and fails if:
- any upstream detector directory is missing from our generated signature surface, or
- generated signatures are missing when tests run.
- extracted positive fixtures are missing when tests run.
The generated TruffleHog data is tracked for parity and audit purposes, but it is not applied by default as raw redaction rules. Many upstream detectors rely on keyword gating and verifier callbacks, and running their extracted regexes directly creates false positives.
src/generated_trufflehog.rs is treated as a parity inventory, not a public API surface. Generated signature names are content-addressed hashes of the pattern data, so reordering upstream extraction no longer renumbers the whole file.
The extracted positive fixtures are also used in the Rust test suite as inline redaction cases. Each case builds literal secret fragments from the upstream positive example and asserts the scrubber preserves length while masking the matched spans in place.
Benchmark
Run the native Criterion benchmark:
It generates a 64 MiB synthetic payload, injects multiple secret shapes, and compares:
- raw
memcpy - straight
std::io::copypass-through into a fixed buffer scrubber/in_placescrubber/stream_lines
For a quick single-number smoke run, you can still use: