htmlsanitizer
A fast, allowlist-based HTML sanitizer. Available as a Rust crate and an npm package (via WebAssembly). 3.7–44x faster than DOMPurify on real HTML content.
Also available in Go.
Features
- O(n) streaming parser — DFA-based finite state machine; no DOM tree, no backtracking
- Allowlist-based — only explicitly permitted tags and attributes pass through; everything else is stripped
- URL sanitization — rejects
javascript:,data:,ftp:, control characters, and opaque URIs - Customizable — add/remove tags, modify allowed attributes, supply a custom URL validator
- Streaming writer (Rust) — implements
std::io::Write; process HTML in chunks without buffering the entire document - Cross-platform — native Rust crate + WASM-powered npm package with identical sanitization logic
Installation
Rust
[]
= "0.1"
npm / TypeScript
The npm package ships a pre-built WASM binary — no native toolchain required.
Quick Start
Rust
use sanitize_string;
let safe = sanitize_string;
assert_eq!;
TypeScript / JavaScript
import { sanitize } from "@bytevet/htmlsanitizer";
const safe = sanitize('<p>Hello</p><script>alert("xss")</script>');
// => "<p>Hello</p>"
Usage (Rust)
Default Sanitization
Use the convenience functions for one-shot sanitization with the default allow list:
use ;
// Sanitize a string
let input = r#"<p>Hello <b>world</b></p><script>alert("xss")</script>"#;
let clean = sanitize_string;
// => "<p>Hello <b>world</b></p>"
// Sanitize bytes
let bytes = b"<img src=\"http://example.com/img.png\" onerror=\"alert(1)\">";
let clean_bytes = sanitize;
// => b"<img src=\"http://example.com/img.png\">"
Custom Allow List
Create an HtmlSanitizer instance to customize which tags and attributes are allowed:
use ;
// Remove a tag from the default allow list
let mut sanitizer = new;
sanitizer.allow_list.remove_tag;
let input = r#"<a href="http://example.com">click</a> <p>safe</p>"#;
let clean = sanitizer.sanitize_string;
// => "click <p>safe</p>"
// Add a custom tag with specific allowed attributes
let mut sanitizer = new;
sanitizer.allow_list.add_tag;
let input = r#"<custom-el data-x="1" onclick="bad">content</custom-el>"#;
let clean = sanitizer.sanitize_string;
// => "<custom-el data-x=\"1\">content</custom-el>"
Custom URL Sanitizer
Supply a custom URL validator using the builder pattern. The built-in default_url_sanitizer is exported so you can compose it with your own logic:
use HtmlSanitizer;
let sanitizer = new.with_url_sanitizer;
let input = r#"<a href="http://trusted.com/page">ok</a> <a href="http://evil.com">bad</a>"#;
let clean = sanitizer.sanitize_string;
// => "<a href=\"http://trusted.com/page\">ok</a> <a>bad</a>"
Streaming Writer
The streaming interface implements std::io::Write, enabling you to process HTML in chunks. State is preserved between writes:
use Write;
use HtmlSanitizer;
let sanitizer = new;
let mut output = Vecnew;
let result = Stringfrom_utf8.unwrap;
// => "<p>Hello </p><b>world</b>"
Usage (npm / TypeScript)
Default Sanitization
import { sanitize } from "@bytevet/htmlsanitizer";
sanitize('<img src=x onerror="alert(1)">');
// => '<img src="x">'
sanitize('<a href="javascript:alert(1)">click</a>');
// => '<a>click</a>'
Custom Configuration
import { HtmlSanitizer } from "@bytevet/htmlsanitizer";
const s = new HtmlSanitizer();
// Remove a tag from the allow list
s.removeTag("a");
s.sanitize('<a href="http://example.com">link</a>');
// => "link"
// Add a custom tag
// Arguments: name, comma-separated attributes, comma-separated URL attributes
s.addTag("custom-el", "data-x,title", "href");
s.sanitize('<custom-el data-x="1" onclick="bad">content</custom-el>');
// => '<custom-el data-x="1">content</custom-el>'
// Add a global attribute (allowed on all tags)
s.addGlobalAttr("data-testid");
// Release WASM memory when done (instance is unusable after this)
s.free();
API Reference
Rust
| Function / Type | Description |
|---|---|
sanitize(data: &[u8]) -> Vec<u8> |
One-shot sanitization (bytes) with the default allow list |
sanitize_string(data: &str) -> String |
One-shot sanitization (string) with the default allow list |
HtmlSanitizer::new() |
Create a sanitizer with the default allow list |
HtmlSanitizer::with_url_sanitizer(f) |
Builder: attach a custom URL validator |
HtmlSanitizer::set_url_sanitizer(&mut self, f) |
Set a custom URL validator on an existing instance |
HtmlSanitizer::sanitize(&self, &[u8]) -> Vec<u8> |
Sanitize bytes |
HtmlSanitizer::sanitize_string(&self, &str) -> String |
Sanitize a string |
HtmlSanitizer::new_writer(w) -> SanitizeWriter<W> |
Create a streaming writer (impl io::Write) |
AllowList |
Tag/attribute configuration; fields: tags, global_attr, non_html_tags |
AllowList::add_tag(&mut self, tag: Tag) |
Add a tag to the allow list |
AllowList::remove_tag(&mut self, name: &str) |
Remove a tag by name |
Tag::new(name, attr, url_attr) |
Define a tag with its allowed regular and URL attributes |
default_allow_list() -> AllowList |
Returns the built-in default allow list |
default_url_sanitizer(&str) -> Option<String> |
The built-in URL validator (reusable in custom validators) |
For complete API documentation, see docs.rs.
npm / TypeScript
| Export | Description |
|---|---|
sanitize(input: string): string |
One-shot sanitization with the default allow list |
new HtmlSanitizer() |
Create a configurable sanitizer instance |
.sanitize(input: string): string |
Sanitize HTML using the instance's configuration |
.addTag(name, attrs?, urlAttrs?) |
Add a tag; attrs and urlAttrs are comma-separated strings |
.removeTag(name: string) |
Remove a tag from the allow list |
.addGlobalAttr(name: string) |
Allow an attribute on all tags |
.free() |
Release WASM memory; the instance is unusable after this |
Default Allow List
The default allow list permits 68 commonly used HTML tags. All other tags are stripped — their text content is preserved. Tags in the non-HTML list (script, style, object) have both their tags and content removed.
Global attributes (allowed on every permitted tag): class, id
| Category | Tags |
|---|---|
| Structural | address, article, aside, footer, header, h1–h6, hgroup, main, nav, section |
| Block content | blockquote, dd, div, dl, dt, figcaption, figure, hr, li, ol, p, pre, ul |
| Inline text | a, abbr, b, bdi, bdo, br, cite, code, data, em, i, kbd, mark, q, s, small, span, strong, sub, sup, time, u |
| Media | area, audio, img, map, track, video, picture, source |
| Table | caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr |
| Edit marks | del, ins |
| Interactive | details, summary |
Notable tag-specific attributes:
| Tag | Regular attributes | URL attributes |
|---|---|---|
a |
rel, target, referrerpolicy |
href |
img |
alt, crossorigin, height, width, loading, referrerpolicy |
src |
video |
autoplay, buffered, controls, crossorigin, duration, loop, muted, preload, height, width |
src, poster |
audio |
autoplay, controls, crossorigin, duration, loop, muted, preload |
src |
td / th |
colspan, rowspan (+ scope for th) |
— |
URL Sanitization
Attributes marked as URL attributes (href, src, poster, cite, etc.) are validated by the URL sanitizer. The default behavior:
Accepted:
http://andhttps://URLs- Relative URLs (paths, fragments, query strings)
Rejected:
javascript:(including case variations and HTML-entity-encoded forms)data:URIsftp:and all other non-HTTP schemes- URLs containing ASCII control characters (bytes < 0x20 or 0x7F)
- Opaque (cannot-be-a-base) URIs
- Percent-encoded ASCII in hostnames
When a URL is rejected, the attribute is removed but the tag and its content are preserved (e.g., <a href="javascript:...">text</a> becomes <a>text</a>).
You can supply a custom URL validator via with_url_sanitizer (Rust) to implement domain restrictions or additional checks. The built-in default_url_sanitizer is exported so you can compose it with your own logic.
Security Considerations
- Defense in depth. This sanitizer is designed as one layer of an XSS mitigation strategy. Combine it with Content Security Policy headers and context-aware output encoding.
- Not a full HTML parser. The DFA-based approach handles real-world HTML effectively but does not build a DOM tree. It is designed to be conservative — when in doubt, content is stripped.
- Fuzz-tested. The project includes a
cargo-fuzzharness. If you discover a bypass, please report it via GitHub Issues. - Consistent cross-platform behavior. The Rust and WASM/npm builds share the same sanitization engine, ensuring identical output.
- Tested against known XSS vectors. The test suite includes vectors from OWASP and other common XSS payloads.
Performance
The sanitizer operates in a single O(n) pass over the input using a 17-state DFA. It allocates no DOM tree and performs no backtracking.
npm: @bytevet/htmlsanitizer vs DOMPurify
Benchmarked with Vitest bench on Node.js (DOMPurify uses jsdom):
| Payload | @bytevet/htmlsanitizer | DOMPurify + jsdom | Ratio |
|---|---|---|---|
| Simple HTML (small) | 56,716 ops/s | 15,253 ops/s | 3.7x faster |
| XSS vectors | 40,908 ops/s | 5,373 ops/s | 7.6x faster |
| Blog post (medium) | 33,259 ops/s | 1,381 ops/s | 24x faster |
| Mixed safe + dangerous | 40,326 ops/s | 3,987 ops/s | 10x faster |
| Large document (~50 KB) | 1,054 ops/s | 24 ops/s | 44x faster |
DOMPurify is faster on tiny plain-text inputs (no HTML tags) due to WASM call overhead (~10 µs). For any real HTML content,
@bytevet/htmlsanitizeris 3.7–44x faster, with the advantage growing as input size increases.
Reproduce with:
&& &&
Rust
Development
# Run tests
# Run clippy
# Run benchmarks
# Build WASM and run npm tests
&& &&
# Fuzz testing
Related Projects
- sym01/htmlsanitizer — Go version
License
MIT — see LICENSE.