rust-sanitize
Scrub sensitive data from logs and configs before sharing them — with support teams, vendors, or AI tools.
rust-sanitize replaces API keys, emails, IPs, passwords, tokens, and other secrets with structurally plausible substitutes. Replacements are one-way: no mapping file is stored, there's no restore mode, and nothing sensitive persists after the run.
Works as a CLI, a Rust library, and an MCP server — so AI assistants like Claude and Cursor can sanitize files on your behalf before raw content ever reaches the model.
MCP: Clean Before the LLM Sees It
Install the MCP server and your AI assistant can sanitize files directly — secrets stay inside the audited Rust process and never enter the context window.
Step 1 — Install the binaries
Download sanitize and sanitize-mcp for your platform from the Releases page and place them on your $PATH (e.g. /usr/local/bin/). No Deno or Node required — the runtime is embedded.
Step 2 — Register with your AI tool
Claude Code:
Claude Desktop (claude_desktop_config.json):
Once connected, the assistant can sanitize inline text or files, scan for leaks without modifying anything, or produce a pre-structured prompt ready for incident triage or config review. See docs/mcp.md for the full tool reference, all parameter examples, and setup instructions for Cursor, Neovim, and OpenCode.
Install
CLI from crates.io:
From source:
# Binary: target/release/sanitize
Windows: Requires the MSVC linker. Install Visual Studio Build Tools and select the Desktop development with C++ workload.
As a Rust library:
MCP binary: Download sanitize-mcp for your platform from the Releases page — no Deno or Node required, the runtime is embedded.
Quick Start
No setup — scan immediately
When you run sanitize with no secrets file or app bundle, the built-in patterns activate automatically. They cover the most common secrets: API keys (AWS, GCP, GitHub, Stripe, Slack, OpenAI, Anthropic, HuggingFace, and more), JWTs, emails, IPv4/IPv6, UUIDs, MAC addresses, PEM headers, credential URLs, and password/secret key=value pairs.
Output goes to server-sanitized.log next to the source. Use -o /path/to/output to override, or -o - for stdout.
# Dry-run — see what would be replaced without writing anything:
# CI gate — fail if secrets are detected:
Guided setup — answer a few questions, get a tailored config
The wizard asks for your workspace type (Generic, Web app, Kubernetes, Database, AWS), replacement strictness, company domains, and which file formats to cover. It produces two files:
secrets.guided.yaml— your pattern set, optionally encryptedsecrets.guided.profile.yaml— structured field rules for the formats you chose
Aggressive strictness also matches broad hostnames, short container IDs, and high-entropy token patterns — recommended when sharing logs with an LLM.
App bundles — zero config for common applications
Built-in bundles for 22 applications pair a secrets pattern set with a structured field profile so field-level sanitization works out of the box, no authoring required.
# See all available bundles:
Built-in bundles: ansible, aws-cli, circleci, django, docker-compose, elasticsearch, fstab, github-actions, gitlab, grafana, heroku, kubernetes, laravel, mongodb, mysql, nginx, postgresql, rails, redis, splunk, spring-boot, terraform.
Common Workflows
Multiple files and archives in one pass
# Produces: server-sanitized.log config-sanitized.yaml backup.sanitized.zip
# Send all outputs to a directory:
Pipe from another command
Stdin is sanitized and sent to stdout automatically.
|
| |
# Mix stdin with file inputs (stdin → stdout; files → per-file siblings):
|
When reading from stdin and the format can't be inferred from a filename, use -f to specify it: -f yaml, -f json, -f csv, -f log, etc.
CI secrets gate
# Fail the build if secrets are detected:
# Same with encrypted patterns file:
SANITIZE_PASSWORD="..."
# Stream per-match findings as NDJSON for jq or SIEM ingest:
| \
Archive entry filtering
Filter which entries inside an archive are processed. * matches within a single directory segment, ** crosses directory boundaries, trailing / matches a subtree.
# Keep only the config directory:
# Keep all JSON, drop the secrets file:
# Independent filters per archive in one command:
Structured field rules (--profile)
Replace specific named fields in YAML, JSON, TOML, CSV, .env, and INI files. Comments, indentation, key ordering, and unmatched values are preserved exactly.
# fields.yaml
- processor: yaml
extensions:
fields:
- pattern: "*.password"
category: "custom:password"
- pattern: "*.username"
category: email
- processor: jsonl
extensions:
options:
skip_invalid: "true" # pass non-JSON lines through unchanged
fields:
- pattern: "*.email"
category: email
- pattern: "*.ip"
category: ipv4
When --profile is active, values discovered in structured fields are automatically written back to the patterns file as literals so the streaming scanner can match them in other files too. See Structured Processing for the full field pattern syntax and two-phase pipeline.
Encrypted patterns file
# Encrypt once:
# Use interactively:
# Non-interactive (CI / pipes):
SANITIZE_PASSWORD="my-password"
# Or read from a file:
Allowlist — pass specific values through unchanged
For project-stable allowlists, add kind: allow entries directly to your patterns file:
- pattern: "*.internal"
kind: allow
- pattern: "192.168.1.*"
kind: allow
Deterministic mode
Same seed + same input produces identical replacements across runs and machines — useful when correlating sanitized data across multiple files or sharing a reproducible dataset with a team.
Shannon entropy detection
Catch high-entropy tokens not covered by any pattern — useful for novel API keys, obfuscated secrets, or anything that wasn't anticipated when the patterns file was written.
# Dry-run prints a calibration histogram to help tune the threshold:
LLM-ready output
Sanitize and produce a structured prompt in one step:
The prompt includes a ## Files Analyzed manifest and embeds sanitized content inline (<content> blocks). For large file sets or agentic LLMs that can read files with their own tools, add --output to switch to reference mode — files are written to disk and the prompt lists their absolute paths instead:
Supported Formats
| Format | Detection |
|---|---|
| Plain text / log | Default fallback for all files |
| JSON | Profile match or {/[ heuristic |
| NDJSON / JSON Lines | Profile match or multi-line { heuristic; streaming — bounded memory for GB-scale log files |
| YAML | Profile match or ---/- /: heuristic |
| TOML | Profile match or [section] heuristic |
| XML | Profile match or <?xml/< heuristic |
| CSV / TSV | Profile match only |
.env |
Profile match only |
INI / .conf |
Profile match only |
| Key-value | Profile match only |
| Log lines (mixed) | Profile match only |
| Tar | .tar extension |
| Tar.gz / .tgz | .tar.gz / .tgz extension |
| Zip | .zip extension |
Library
use Category;
use HmacGenerator;
use MappingStore;
use Arc;
// Deterministic generator seeded with a fixed 32-byte key.
let generator = new;
let store = new;
// One-way replacement.
let sanitized = store.get_or_insert.unwrap;
assert!;
assert_eq!;
// Same input → same output within a run.
let again = store.get_or_insert.unwrap;
assert_eq!;
See Library API Reference for the full module-by-module API.
Security Model
Replacements are one-way by design. No reverse mapping is stored or recoverable from sanitized output alone. The MappingStore forward map lives only in process memory and is zeroized on drop.
Key properties:
- Encryption at rest — secrets files use AES-256-GCM (PBKDF2-HMAC-SHA256, 600 000 iterations). Plaintext files are also supported.
- Zeroization — HMAC keys, secret entries, mapping-store keys, and decrypted blobs are zeroized on drop.
- Regex hardening — per-pattern automaton and DFA size limits (1 MiB each) prevent ReDoS and unbounded memory growth.
- Defensive limits — input size caps, recursion depth limits, node-count caps, and pattern-count limits bound every parser.
- Zero
unsafe— thread safety throughDashMapandArc;Send + Syncbounds verified at compile time.
See SECURITY.md for the full threat model and mitigations.
Documentation
| Document | Description |
|---|---|
| MCP Reference | MCP server setup, all tool parameters, JSON examples, IDE configs (Cursor, Neovim, OpenCode), and namespace-based multi-tenant setup. |
| CLI Reference | Full sanitize command reference including all flags, subcommands, secrets file format, and examples. |
| Structured Processing | --profile usage, field patterns, two-phase pipeline, format preservation, and processor options. |
| Supported Categories | All 18 built-in replacement categories with strategies and examples, plus custom categories. |
| Pluggable Strategies | The Strategy trait, 5 built-in strategies, and guide to writing custom strategies. |
| Library API Reference | Module-by-module public API tables. |
| Defensive Limits & Streaming | Streaming chunking model, archive processing flow, and all defensive size/depth/count limits. |
| Architecture | Internal architecture, data flow, module map, concurrency model, and streaming design. |
| Security | Security properties, threat mitigations, encryption details, and zeroization strategy. |
| Contributing | Build instructions, test suite, fuzz targets, linting, and PR guidelines. |
| Changelog | Release history and version notes. |
Limitations
- No restore. Replacements are one-way by design. No undo, decrypt-output, or reverse-mapping capability.
- Structured-to-scanner handoff. When
--profileis active, discovered values are appended to your secrets file askind: literalentries so the scanner can find them in other files. Use--no-structured-handoffto suppress the write if needed. - Structured processor size limit. Files over 256 MiB (or
--max-structured-size) fall back to the streaming scanner, which replaces raw bytes without document awareness. In practice this only affects large serialized data dumps, not real config files. - Deterministic mode caveats. Identical output requires the same secrets file and the same seed. Changing either produces completely different replacements.
- Zeroization scope. Covers secrets, HMAC keys, and mapping-store keys. Incidental copies the Rust compiler creates during optimization passes are not covered — an inherent limitation of safe Rust zeroization.
- Large archive sequential fallback. Zip and tar archives whose total uncompressed content exceeds 256 MiB are processed sequentially rather than in parallel to avoid unbounded memory use.
- Binary detection. Entries detected as binary are skipped by default. Use
--include-binaryto override.
Security Disclosure
Do not open a public issue for security-sensitive findings. Report privately via a security advisory on the repository or via the maintainer contact in Cargo.toml. Include a description, reproduction steps, and potential impact. Maintainers will acknowledge within 5 business days and provide a fix or mitigation timeline within 30 days.
Stability
This project follows Semantic Versioning. As of 0.8.0, the public library API and CLI interface are considered stable. Breaking changes will be avoided but may occur in minor releases until 1.0.0. The MSRV is 1.74 (stable toolchain), declared under rust-version in Cargo.toml and enforced in CI.
See CHANGELOG.md for release history.
License
Licensed under the Apache License, Version 2.0. See LICENSE for the full text.