htmlsanitizer-0.1.1 has been yanked.

htmlsanitizer

A fast, allowlist-based HTML sanitizer. Available as a Rust crate and an npm package (via WebAssembly). 3.7–44x faster than DOMPurify on real HTML content.

Also available in Go.

Features

O(n) streaming parser — DFA-based finite state machine; no DOM tree, no backtracking
Allowlist-based — only explicitly permitted tags and attributes pass through; everything else is stripped
URL sanitization — rejects javascript:, data:, ftp:, control characters, and opaque URIs
Customizable — add/remove tags, modify allowed attributes, supply a custom URL validator
Streaming writer (Rust) — implements std::io::Write; process HTML in chunks without buffering the entire document
Cross-platform — native Rust crate + WASM-powered npm package with identical sanitization logic

Installation

Rust

[dependencies]
htmlsanitizer = "0.1"

npm / TypeScript

npm install @bytevet/htmlsanitizer

The npm package ships a pre-built WASM binary — no native toolchain required.

Quick Start

Rust

use htmlsanitizer::sanitize_string;

let safe = sanitize_string(r#"<p>Hello</p><script>alert("xss")</script>"#);
assert_eq!(safe, "<p>Hello</p>");

TypeScript / JavaScript

import { sanitize } from "@bytevet/htmlsanitizer";

const safe = sanitize('<p>Hello</p><script>alert("xss")</script>');
// => "<p>Hello</p>"

Usage (Rust)

Default Sanitization

Use the convenience functions for one-shot sanitization with the default allow list:

use htmlsanitizer::{sanitize_string, sanitize};

// Sanitize a string
let input = r#"<p>Hello <b>world</b></p><script>alert("xss")</script>"#;
let clean = sanitize_string(input);
// => "<p>Hello <b>world</b></p>"

// Sanitize bytes
let bytes = b"<img src=\"http://example.com/img.png\" onerror=\"alert(1)\">";
let clean_bytes = sanitize(bytes);
// => b"<img src=\"http://example.com/img.png\">"

Custom Allow List

Create an HtmlSanitizer instance to customize which tags and attributes are allowed:

use htmlsanitizer::{HtmlSanitizer, Tag};

// Remove a tag from the default allow list
let mut sanitizer = HtmlSanitizer::new();
sanitizer.allow_list.remove_tag("a");

let input = r#"<a href="http://example.com">click</a> <p>safe</p>"#;
let clean = sanitizer.sanitize_string(input);
// => "click <p>safe</p>"

// Add a custom tag with specific allowed attributes
let mut sanitizer = HtmlSanitizer::new();
sanitizer.allow_list.add_tag(Tag::new("custom-el", &["data-x"], &[]));

let input = r#"<custom-el data-x="1" onclick="bad">content</custom-el>"#;
let clean = sanitizer.sanitize_string(input);
// => "<custom-el data-x=\"1\">content</custom-el>"

Custom URL Sanitizer

Supply a custom URL validator using the builder pattern. The built-in default_url_sanitizer is exported so you can compose it with your own logic:

use htmlsanitizer::HtmlSanitizer;

let sanitizer = HtmlSanitizer::new().with_url_sanitizer(|raw_url| {
    let sanitized = htmlsanitizer::default_url_sanitizer(raw_url)?;
    if sanitized.contains("trusted.com") {
        Some(sanitized)
    } else {
        None
    }
});

let input = r#"<a href="http://trusted.com/page">ok</a> <a href="http://evil.com">bad</a>"#;
let clean = sanitizer.sanitize_string(input);
// => "<a href=\"http://trusted.com/page\">ok</a> <a>bad</a>"

Streaming Writer

The streaming interface implements std::io::Write, enabling you to process HTML in chunks. State is preserved between writes:

use std::io::Write;
use htmlsanitizer::HtmlSanitizer;

let sanitizer = HtmlSanitizer::new();
let mut output = Vec::new();

{
    let mut writer = sanitizer.new_writer(&mut output);

    // Write HTML in chunks — state is preserved between writes
    writer.write_all(b"<p>Hello </p><scr").expect("write failed");
    writer.write_all(b"ipt>alert('xss')</script>").expect("write failed");
    writer.write_all(b"<b>world</b>").expect("write failed");
}

let result = String::from_utf8(output).unwrap();
// => "<p>Hello </p><b>world</b>"

Usage (npm / TypeScript)

Default Sanitization

import { sanitize } from "@bytevet/htmlsanitizer";

sanitize('<img src=x onerror="alert(1)">');
// => '<img src="x">'

sanitize('<a href="javascript:alert(1)">click</a>');
// => '<a>click</a>'

Custom Configuration

import { HtmlSanitizer } from "@bytevet/htmlsanitizer";

const s = new HtmlSanitizer();

// Remove a tag from the allow list
s.removeTag("a");
s.sanitize('<a href="http://example.com">link</a>');
// => "link"

// Add a custom tag
// Arguments: name, comma-separated attributes, comma-separated URL attributes
s.addTag("custom-el", "data-x,title", "href");
s.sanitize('<custom-el data-x="1" onclick="bad">content</custom-el>');
// => '<custom-el data-x="1">content</custom-el>'

// Add a global attribute (allowed on all tags)
s.addGlobalAttr("data-testid");

// Release WASM memory when done (instance is unusable after this)
s.free();

API Reference

Rust

Function / Type	Description
`sanitize(data: &[u8]) -> Vec<u8>`	One-shot sanitization (bytes) with the default allow list
`sanitize_string(data: &str) -> String`	One-shot sanitization (string) with the default allow list
`HtmlSanitizer::new()`	Create a sanitizer with the default allow list
`HtmlSanitizer::with_url_sanitizer(f)`	Builder: attach a custom URL validator
`HtmlSanitizer::set_url_sanitizer(&mut self, f)`	Set a custom URL validator on an existing instance
`HtmlSanitizer::sanitize(&self, &[u8]) -> Vec<u8>`	Sanitize bytes
`HtmlSanitizer::sanitize_string(&self, &str) -> String`	Sanitize a string
`HtmlSanitizer::new_writer(w) -> SanitizeWriter<W>`	Create a streaming writer (`impl io::Write`)
`AllowList`	Tag/attribute configuration; fields: `tags`, `global_attr`, `non_html_tags`
`AllowList::add_tag(&mut self, tag: Tag)`	Add a tag to the allow list
`AllowList::remove_tag(&mut self, name: &str)`	Remove a tag by name
`Tag::new(name, attr, url_attr)`	Define a tag with its allowed regular and URL attributes
`default_allow_list() -> AllowList`	Returns the built-in default allow list
`default_url_sanitizer(&str) -> Option<String>`	The built-in URL validator (reusable in custom validators)

For complete API documentation, see docs.rs.

npm / TypeScript

Export	Description
`sanitize(input: string): string`	One-shot sanitization with the default allow list
`new HtmlSanitizer()`	Create a configurable sanitizer instance
`.sanitize(input: string): string`	Sanitize HTML using the instance's configuration
`.addTag(name, attrs?, urlAttrs?)`	Add a tag; `attrs` and `urlAttrs` are comma-separated strings
`.removeTag(name: string)`	Remove a tag from the allow list
`.addGlobalAttr(name: string)`	Allow an attribute on all tags
`.free()`	Release WASM memory; the instance is unusable after this

Default Allow List

The default allow list permits 68 commonly used HTML tags. All other tags are stripped — their text content is preserved. Tags in the non-HTML list (script, style, object) have both their tags and content removed.

Global attributes (allowed on every permitted tag): class, id

Category	Tags
Structural	`address`, `article`, `aside`, `footer`, `header`, `h1`–`h6`, `hgroup`, `main`, `nav`, `section`
Block content	`blockquote`, `dd`, `div`, `dl`, `dt`, `figcaption`, `figure`, `hr`, `li`, `ol`, `p`, `pre`, `ul`
Inline text	`a`, `abbr`, `b`, `bdi`, `bdo`, `br`, `cite`, `code`, `data`, `em`, `i`, `kbd`, `mark`, `q`, `s`, `small`, `span`, `strong`, `sub`, `sup`, `time`, `u`
Media	`area`, `audio`, `img`, `map`, `track`, `video`, `picture`, `source`
Table	`caption`, `col`, `colgroup`, `table`, `tbody`, `td`, `tfoot`, `th`, `thead`, `tr`
Edit marks	`del`, `ins`
Interactive	`details`, `summary`

Notable tag-specific attributes:

Tag	Regular attributes	URL attributes
`a`	`rel`, `target`, `referrerpolicy`	`href`
`img`	`alt`, `crossorigin`, `height`, `width`, `loading`, `referrerpolicy`	`src`
`video`	`autoplay`, `buffered`, `controls`, `crossorigin`, `duration`, `loop`, `muted`, `preload`, `height`, `width`	`src`, `poster`
`audio`	`autoplay`, `controls`, `crossorigin`, `duration`, `loop`, `muted`, `preload`	`src`
`td` / `th`	`colspan`, `rowspan` (+ `scope` for `th`)	—

URL Sanitization

Attributes marked as URL attributes (href, src, poster, cite, etc.) are validated by the URL sanitizer. The default behavior:

Accepted:

http:// and https:// URLs
Relative URLs (paths, fragments, query strings)

Rejected:

javascript: (including case variations and HTML-entity-encoded forms)
data: URIs
ftp: and all other non-HTTP schemes
URLs containing ASCII control characters (bytes < 0x20 or 0x7F)
Opaque (cannot-be-a-base) URIs
Percent-encoded ASCII in hostnames

When a URL is rejected, the attribute is removed but the tag and its content are preserved (e.g., <a href="javascript:...">text</a> becomes <a>text</a>).

You can supply a custom URL validator via with_url_sanitizer (Rust) to implement domain restrictions or additional checks. The built-in default_url_sanitizer is exported so you can compose it with your own logic.

Security Considerations

Defense in depth. This sanitizer is designed as one layer of an XSS mitigation strategy. Combine it with Content Security Policy headers and context-aware output encoding.
Not a full HTML parser. The DFA-based approach handles real-world HTML effectively but does not build a DOM tree. It is designed to be conservative — when in doubt, content is stripped.
Fuzz-tested. The project includes a cargo-fuzz harness. If you discover a bypass, please report it via GitHub Issues.
Consistent cross-platform behavior. The Rust and WASM/npm builds share the same sanitization engine, ensuring identical output.
Tested against known XSS vectors. The test suite includes vectors from OWASP and other common XSS payloads.

Performance

The sanitizer operates in a single O(n) pass over the input using a 17-state DFA. It allocates no DOM tree and performs no backtracking.

npm: `@bytevet/htmlsanitizer` vs DOMPurify

Benchmarked with Vitest bench on Node.js (DOMPurify uses jsdom):

Payload	@bytevet/htmlsanitizer	DOMPurify + jsdom	Ratio
Simple HTML (small)	56,716 ops/s	15,253 ops/s	3.7x faster
XSS vectors	40,908 ops/s	5,373 ops/s	7.6x faster
Blog post (medium)	33,259 ops/s	1,381 ops/s	24x faster
Mixed safe + dangerous	40,326 ops/s	3,987 ops/s	10x faster
Large document (~50 KB)	1,054 ops/s	24 ops/s	44x faster

DOMPurify is faster on tiny plain-text inputs (no HTML tags) due to WASM call overhead (~10 µs). For any real HTML content, @bytevet/htmlsanitizer is 3.7–44x faster, with the advantage growing as input size increases.

Reproduce with:

cd bench-npm && npm install && npm run bench

Rust

cargo bench

Development

# Run tests
cargo test

# Run clippy
cargo clippy --all-targets --all-features

# Run benchmarks
cargo bench

# Build WASM and run npm tests
cd npm && npm run build && npm test

# Fuzz testing
cargo +nightly fuzz run sanitize

Related Projects

sym01/htmlsanitizer — Go version

License

MIT — see LICENSE.

htmlsanitizer 0.1.1

htmlsanitizer

Features

Installation

Rust

npm / TypeScript

Quick Start

Rust

TypeScript / JavaScript

Usage (Rust)

Default Sanitization

Custom Allow List

Custom URL Sanitizer

Streaming Writer

Usage (npm / TypeScript)

Default Sanitization

Custom Configuration

API Reference

Rust

npm / TypeScript

Default Allow List

URL Sanitization

Security Considerations

Performance

npm: @bytevet/htmlsanitizer vs DOMPurify

Rust

Development

Related Projects

License

npm: `@bytevet/htmlsanitizer` vs DOMPurify