spider-tendril
A Send-by-default fork of tendril for high-concurrency HTML parsing.
Why this fork?
Upstream tendril defaults Tendril<F, A = NonAtomic> to non-atomic refcounting. That makes StrTendril and ByteTendril !Send, which propagates through markup5ever::BufferQueue, html5ever::Tokenizer, html5ever::TreeBuilder, and html5ever::Parser. None of them can cross thread boundaries — so no future holding an html5ever parser across an .await point can be Send.
This fork flips a single default: Tendril<F, A = Atomic>. The struct is otherwise unchanged. As a result:
StrTendrilandByteTendrilare nowSend + Syncby default.- The whole
markup5ever/html5everparser stack can be madeSendsimply by transitively depending onspider-tendril(seespider-markup5everandspider-html5ever). - Parser state can move freely between tokio worker threads in a multi-threaded async runtime.
The cost is a few extra atomic ops per refcount bump (≈5–10 ns each). Behavior, parse output, and the public API are identical to upstream. If you specifically need non-atomic refcounting for a performance-critical single-threaded use case, write Tendril<F, NonAtomic> explicitly.
Library name
The crate publishes as spider-tendril on crates.io but the library itself is still imported as tendril:
[]
= "0.5"
use StrTendril;
This means existing code that uses tendril types compiles without changes — just swap the dependency.
Original tendril docs
Tendril is a compact string/buffer type optimized for zero-copy parsing. Tendrils have the semantics of owned strings, but are sometimes views into shared buffers. When you mutate a tendril, an owned copy is made if necessary; further mutations occur in-place until the string becomes shared (e.g. via clone() or subtendril()).
Tendril uses phantom types to track a buffer's format. This determines at compile time which operations are available on a given tendril — for example, Tendril<UTF8> and Tendril<Bytes> can be borrowed as &str and &[u8] respectively.
Whereas String allocates on the heap for any non-empty string, Tendril can store small strings (up to 8 bytes) inline. Tendril is also smaller than String on 64-bit platforms — 16 bytes versus 24. Option<Tendril> is the same size as Tendril.
The maximum length of a tendril is 4 GB. The library will panic if you attempt to go over the limit.
License
Licensed under either of Apache License, Version 2.0 or MIT license at your option, matching the upstream tendril license.