Qubit Sanitize

Reusable sanitization utilities for Rust.

Overview

Qubit Sanitize provides reusable tools for masking sensitive data in logs, diagnostics, and structured debug output. The core layer handles the common field-value problem shared by HTTP clients, command runners, configuration objects, and other crates: given a (field, value) pair, decide whether the field is sensitive and return the safe value to display.

The adapter layer builds on that core policy for common structured inputs such as URLs, URL-encoded forms, HTTP headers, HTTP bodies, argv vectors, and environment variables. Adapters parse only the formats they explicitly model; shell command strings and other application-specific protocols should still be handled by caller crates that have the full context.

Features

Field-name canonicalization for separator-insensitive matching
Built-in sensitive field defaults for credentials, tokens, HTTP auth, cookies, and sessions
Configurable sensitivity levels: Low, Medium, High, and Secret
Level-specific masking through MaskPolicies
Mask strategies for fixed replacement, edge preservation, suffix preservation, and full removal
A FieldSanitizer object that sanitizes single field-value pairs
Convenience helpers for sanitizing BTreeMap<String, String> values by key
Adapters for URLs, URL-encoded forms, HTTP headers, HTTP bodies, argv vectors, and environment variables

Quick Start

use qubit_sanitize::{
    FieldSanitizer,
    NameMatchMode,
};

let sanitizer = FieldSanitizer::default();

assert_eq!(
    sanitizer.sanitize_value(
        "password",
        "correct-horse-battery-staple",
        NameMatchMode::Exact,
    ),
    "<redacted>",
);
assert_eq!(
    sanitizer.sanitize_value("mode", "debug", NameMatchMode::Exact),
    "debug",
);

Sensitivity Levels

Sensitive fields are assigned one of four levels:

Level	Intended use	Default mask
`Low`	Values where a small prefix and suffix help diagnostics	`ab****yz`
`Medium`	Identifiers where only the tail should remain visible	`****z`
`High`	Tokens or API keys that should not expose edges	`****`
`Secret`	Passwords, private keys, or client secrets	`<redacted>`

The defaults are conservative for operational logs. You can replace any level's masking strategy through MaskPolicies.

Mask Policies

use qubit_sanitize::MaskPolicy;

let edge = MaskPolicy::preserve_edges(2, 2, "****", 4);
assert_eq!(edge.mask("abcdefgh"), "ab****gh");

let suffix = MaskPolicy::preserve_suffix(4, "****", 4);
assert_eq!(suffix.mask("1234567890"), "****7890");

let fixed = MaskPolicy::fixed("****");
assert_eq!(fixed.mask("secret"), "****");

Empty values are kept empty. This avoids changing the semantics of fields that are present but intentionally blank.

Sensitive Fields

SensitiveFields::default() contains common sensitive names such as:

password, passwd, secret, client_secret, private_key
api_key, x_api_key
token, access_token, refresh_token, id_token
authorization, proxy_authorization, cookie, set_cookie
session, session_id, session_token

Field names are canonicalized before lookup. Separators such as _, -, ., and whitespace are ignored, and names are lowercased:

use qubit_sanitize::canonicalize_field_name;

assert_eq!(canonicalize_field_name(" access-token "), "accesstoken");
assert_eq!(canonicalize_field_name("access_token"), "accesstoken");
assert_eq!(canonicalize_field_name("access.token"), "accesstoken");

Name Matching Modes

Core methods such as sanitize_value and sanitize_map require callers to choose a field-name matching mode. Use NameMatchMode::Exact for exact canonical field-name matching. For contextual names where callers want OPENAI_API_KEY to match the configured field api_key, use NameMatchMode::ExactOrSuffix.

use qubit_sanitize::{
    FieldSanitizer,
    NameMatchMode,
};

let sanitizer = FieldSanitizer::default();

assert_eq!(
    sanitizer.sanitize_value(
        "OPENAI_API_KEY",
        "abcdef",
        NameMatchMode::Exact,
    ),
    "abcdef",
);
assert_eq!(
    sanitizer.sanitize_value(
        "OPENAI_API_KEY",
        "abcdef",
        NameMatchMode::ExactOrSuffix,
    ),
    "****",
);

Custom Fields

use qubit_sanitize::{
    FieldSanitizer,
    NameMatchMode,
    SensitivityLevel,
};

let mut sanitizer = FieldSanitizer::default();
sanitizer.insert_sensitive_field("license_key", SensitivityLevel::Medium);

assert_eq!(
    sanitizer.sanitize_value("license-key", "abcdef", NameMatchMode::Exact),
    "****f",
);

You can also start from an empty policy when you do not want built-in field names:

use qubit_sanitize::{
    FieldSanitizePolicy,
    FieldSanitizer,
    SensitivityLevel,
};

let mut sanitizer = FieldSanitizer::new(FieldSanitizePolicy::empty());
sanitizer.insert_sensitive_field("tenant_secret", SensitivityLevel::Secret);

Map Sanitization

use std::collections::BTreeMap;

use qubit_sanitize::{
    FieldSanitizer,
    NameMatchMode,
};

let sanitizer = FieldSanitizer::default();
let mut values = BTreeMap::new();
values.insert("password".to_string(), "secret".to_string());
values.insert("name".to_string(), "alice".to_string());

let sanitized = sanitizer.sanitize_map(&values, NameMatchMode::Exact);

assert_eq!(sanitized["password"], "<redacted>");
assert_eq!(sanitized["name"], "alice");
assert_eq!(values["password"], "secret");

For mutable structured data, use sanitize_map_in_place with an explicit NameMatchMode.

Adapter Sanitization

use qubit_sanitize::{
    ArgvSanitizer,
    FormUrlEncodedSanitizer,
    HttpBodySanitizer,
    HttpHeaderSanitizer,
    NameMatchMode,
    UrlSanitizer,
};
use http::header::AUTHORIZATION;
use http::HeaderValue;

let url = UrlSanitizer::default()
    .sanitize_url_str(
        "https://alice:secret@example.com/path?access_token=abcdef#callback",
        NameMatchMode::ExactOrSuffix,
    )
    .expect("sample URL should parse");
assert_eq!(
    url,
    "https://****:****@example.com/path?access_token=****#****",
);

let form = FormUrlEncodedSanitizer::default()
    .sanitize_str("username=alice&password=secret", NameMatchMode::ExactOrSuffix);
assert_eq!(form, "username=alice&password=%3Credacted%3E");

let header = HttpHeaderSanitizer::default()
    .sanitize_value(
        &AUTHORIZATION,
        &HeaderValue::from_static("Bearer abcdef"),
        NameMatchMode::ExactOrSuffix,
    );
assert_eq!(header, "****");

let body_content_type = HeaderValue::from_static("application/json");
let body = HttpBodySanitizer::default().sanitize_body(
    br#"{"user":"alice","password":"secret"}"#,
    Some(&body_content_type),
    NameMatchMode::ExactOrSuffix,
);
assert_eq!(body, r#"{"password":"<redacted>","user":"alice"}"#);

let argv = ArgvSanitizer::default()
    .sanitize_argv_display(
        ["docker", "login", "--password", "secret"],
        NameMatchMode::ExactOrSuffix,
    );
assert_eq!(argv, r#"["docker", "login", "--password", "<redacted>"]"#);

Adapter methods require an explicit NameMatchMode, just like the core FieldSanitizer methods. Use NameMatchMode::ExactOrSuffix when contextual names such as OPENAI_API_KEY should match the configured field api_key.

Integration Guidance

The crate has two layers:

Use core / root exports such as FieldSanitizer for field-name matching and value masking.
Use adapter / root exports such as UrlSanitizer, HttpBodySanitizer, and ArgvSanitizer for supported structured inputs.
Keep protocol-specific parsing in caller crates when the adapter cannot model the full context, especially shell command strings and application-specific payloads.

For example, an HTTP crate can use UrlSanitizer for parsed URLs and HttpHeaderSanitizer for http::HeaderMap and http::HeaderValue values. It can use HttpBodySanitizer when it has body bytes plus an optional Content-Type header; the adapter supports JSON, NDJSON, URL-encoded forms, multipart bodies, declared text/* bodies, and binary fallback markers. Unsupported UTF-8 media types are redacted rather than passed through. The returned body string is for logs and diagnostics, not a replayable HTTP body: structured output may be compacted and may not preserve original whitespace, field order, or JSON value types for redacted fields. The caller still owns capture limits, decompression, streaming boundaries, and any application-specific parsing. A command runner can use ArgvSanitizer for structured argv and EnvSanitizer for explicit environment overrides, but should not claim to safely parse arbitrary shell scripts.

Testing

A minimal local run:

cargo test
cargo clippy --all-targets --all-features -- -D warnings

To mirror what continuous integration enforces, run the repository scripts from the project root: ./align-ci.sh brings local tooling and configuration in line with CI, then ./ci-check.sh runs the same checks the pipeline uses. For test coverage, use ./coverage.sh to generate or open reports.

Contributing

Issues and pull requests are welcome.

Keep the core focused on reusable field-value sanitization primitives.
Keep adapters scoped to formats with clear, bounded parsing rules.
Add or update tests when changing matching or masking behavior.
Update this README and public rustdoc when user-visible behavior changes.
Before submitting, run ./align-ci.sh and then ./ci-check.sh.

By contributing, you agree to license your contributions under the Apache License, Version 2.0, the same license as this project.

License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file in the repository for the full text.

Author

Haixing Hu — Qubit Co. Ltd.


Repository	github.com/qubit-ltd/rs-sanitize
Documentation	docs.rs/qubit-sanitize
Crate	crates.io/crates/qubit-sanitize

qubit-sanitize 0.2.1