tiptap-rusty-parser
Fast, schema-agnostic parser & manipulator for Tiptap /
ProseMirror JSONContent documents, in Rust.
- Schema-agnostic — any node/mark
typeis accepted; unknown JSON fields are preserved for lossless roundtrip. - Query with predicate closures —
find,find_all,walk,descendants. - Select by type/mark/attr —
by_type,by_mark,by_attr. - Address by index path —
node_at,path_to,paths_to. - Mutate in place — marks, attrs, children, text, and bulk
replace_all. - Extract text —
text_content,char_count,word_count(Unicode-aware). - Validate (opt-in) — check against an allow-list
Schema(Rust or JSON). - Build ergonomically —
Node::element,Node::text,doc(..),with_*chaining. - JS/WASM —
npm i tiptap-rusty-parserfor browser/bundler apps (see JavaScript / WASM). - Fast — borrow over copy, stack-based traversal (no recursion blowup),
ltorelease profile, criterion benches.
Table of contents
- Install
- Quick start
- Data model
- Parsing & serializing
- Querying
- Selectors
- Node paths
- Mutating
- Normalizing
- Range editing
- Text extraction
- Schema validation
- Diffing
- Transactions
- Rendering to HTML
- Building nodes
- JavaScript / WASM
- Error handling
- Performance
- Examples
- Development
- License
Install
Add to Cargo.toml:
[]
= "0.1"
Requires a recent stable Rust (edition 2021).
Quick start
use ;
Data model
A Tiptap document is a tree of nodes. The crate mirrors Tiptap's JSONContent
shape directly.
Map/Value are re-exported from serde_json. The crate is built with the
preserve_order feature so attribute key order survives a roundtrip.
Why everything is Option — to faithfully distinguish missing from
empty. content: None serializes to no content key; content: Some(vec![])
serializes to "content": []. Unknown node types (custom Tiptap extensions) and
unknown fields land in extra and roundtrip untouched.
Document is a thin owning wrapper around the root Node and derefs to it,
so every Node method below is also callable directly on a Document.
Parsing & serializing
use Document;
use json;
// From a JSON string
let doc = from_json_str?;
// From a serde_json::Value
let doc = from_value?;
// From any reader (file, socket, …)
let file = open?;
let doc = from_reader?;
// Serialize
let compact = doc.to_json_str?; // String, compact
let pretty = doc.to_string_pretty?; // String, indented
let value = doc.to_value?; // serde_json::Value
# Ok::
Roundtrip is lossless — unknown node types, extra fields, and key order are all preserved.
Access the root node explicitly when needed: doc.root(), doc.root_mut(),
doc.into_root(). Wrap an existing node with Document::new(node) or
node.into().
Querying
All traversal is depth-first pre-order (a node is visited before its children). Selection is done with predicate closures — no selector DSL to learn.
use ;
let doc = from_json_str?;
// First match (incl. the node itself)
let first_para: = doc.find;
// All matches
let texts: = doc.find_all;
assert_eq!;
// Predicate can inspect anything: marks, attrs, text…
let bold = doc.find.unwrap;
assert_eq!;
// Lazy iterator over self + all descendants
let count = doc.descendants.count;
// Visit every node
let mut n = 0;
doc.walk;
# Ok::
Mutable variants return &mut access:
# use ;
# let mut doc = from_json_str?;
// Single mutable match
if let Some = doc.find_mut
// All mutable matches (predicate passed by &mut)
let mut is_text = ;
for node in doc.root_mut.find_all_mut
// In-place visit
doc.walk_mut;
# Ok::
| Method | Signature | Returns |
|---|---|---|
find |
find(|&Node| -> bool) |
Option<&Node> |
find_mut |
find_mut(|&Node| -> bool) |
Option<&mut Node> |
find_all |
find_all(|&Node| -> bool) |
Vec<&Node> |
find_all_mut |
find_all_mut(&mut |&Node| -> bool) |
Vec<&mut Node> |
walk |
walk(&mut |&Node|) |
() |
walk_mut |
walk_mut(&mut |&mut Node|) |
() |
descendants |
descendants() |
impl Iterator<Item = &Node> |
Selectors
Convenience wrappers over the closure API for the common cases — no closure to write, and a friendlier surface for future CLI/FFI layers.
use Document;
let doc = from_json_str?;
doc.by_type; // -> Vec<&Node>
doc.first_by_type; // -> Option<&Node>
doc.by_mark; // -> Vec<&Node> (nodes carrying the mark)
doc.by_attr; // -> Vec<&Node> (attr equals value)
// mutable
# let mut doc = doc;
for n in doc.root_mut.by_type_mut
# Ok::
| Method | Returns |
|---|---|
by_type(t) / first_by_type(t) / by_type_mut(t) |
Vec<&Node> / Option<&Node> / Vec<&mut Node> |
by_mark(mark_type) |
Vec<&Node> |
by_attr(key, value) |
Vec<&Node> |
Node paths
Address nodes by index path — a slice of child indices, root = &[]. In a
doc → paragraph → text tree the text node is at &[0, 0]. There are no parent
pointers; parent/sibling navigation is just path slicing.
use Document;
let mut doc = from_json_str?;
doc.node_at; // -> Option<&Node> (the "b" text node)
doc.node_at_mut.unwrap.set_text;
let p = doc.path_to.unwrap; // -> vec![0, 1]
let parent = doc.node_at.unwrap; // its paragraph
doc.paths_to; // every text location
# Ok::
| Method | Returns |
|---|---|
node_at(path) / node_at_mut(path) |
Option<&Node> / Option<&mut Node> |
path_to(pred) |
Option<Vec<usize>> (first match, pre-order) |
paths_to(pred) |
Vec<Vec<usize>> (all matches) |
Mutating
Mutation is in place on a &mut Node / &mut Document — no copies, no
rebuild. Container fields auto-collapse to None when they become empty (e.g.
removing the last mark sets marks back to None), keeping output clean.
Marks
use ;
let mut t = text;
t.add_mark; // -> true (added)
t.add_mark; // -> false (already present; deduped)
t.has_mark; // -> true
t.get_mark; // -> Option<&Mark>
t.toggle_mark; // add if absent, remove if present
t.set_mark_attr; // set attr on an existing mark
t.remove_mark; // -> usize (count removed)
t.clear_marks; // drop all marks
Attributes
use Node;
use json;
let mut h = element;
h.set_attr; // -> previous value, if any
h.attr; // -> Option<&Value> => Some(&json!(2))
h.attrs_mut.insert; // raw map access
h.remove_attr; // -> Option<Value>
Children
use Node;
let mut p = element;
p.push_child;
p.push_child;
p.insert_child; // index clamped to len
p.child_count; // -> 3
p.child; // -> Option<&Node>
p.child_mut; // -> Option<&mut Node>
p.children; // -> &[Node]
p.children_mut; // -> &mut Vec<Node> (creates if absent)
p.replace_child; // -> Option<Node> (old)
p.remove_child; // -> Option<Node> (removed)
p.retain_children; // filter in place
p.clear_children; // remove all
Text
# use Node;
let mut t = text;
t.get_text; // -> Some("old")
t.set_text;
Bulk transforms
replace_all walks the whole subtree, applying a mutation to every node that
matches a predicate, and returns how many were changed.
use ;
let mut doc = from_json_str?;
let changed = doc.replace_all;
assert_eq!;
# Ok::
Normalizing
normalize canonicalizes a tree in place: it merges adjacent text nodes that
share the same marks/attrs (and any extra fields) and drops empty text nodes.
This yields smaller diffs, cleaner roundtrips, and one stable
representation for trees that are semantically identical but split differently.
It is idempotent.
use Document;
let mut doc = from_json_str?;
doc.normalize;
assert_eq!; // collapsed to one text node
assert_eq!;
# Ok::
Tune it with NormalizeOptions (a plain data struct, so it works over WASM/FFI
too): toggle merge_adjacent_text / remove_empty_text, or opt into
remove_empty_nodes to also prune nodes whose content is an empty list (off
by default — an empty paragraph is valid). Absent (None) content is always
left untouched, preserving the empty-vs-missing distinction.
Range editing
Editor-style commands over a single block's inline content, addressed by a
Position (child index + Unicode-scalar offset into that child's text); a
Range spans two positions in the same block. Text nodes are split at the
boundaries as needed and adjacent equal-mark text is merged again afterwards, so
edits leave the content canonical.
use ;
let mut p = element.with_child;
// Bold "world".
p.add_mark_range?;
assert!; // "Hello " | "world"(bold)
// Insert, delete, replace by position/range.
p.insert_text?;
p.delete_range?;
# Ok::
The methods — insert_text, delete_range, replace_range, add_mark_range,
remove_mark_range, toggle_mark_range — treat self as the block parent. To
edit a nested block, resolve it first: doc.node_at_mut(&path)?.delete_range(r).
Offsets count Unicode scalar values (so splits never land mid-code-point);
out-of-range positions return a RangeError rather than clamping.
Text extraction
use Document;
let doc = from_json_str?;
doc.text_content; // "Hello worldsecond line" (ProseMirror semantics)
doc.text_content_with_separator; // "Hello world\n\nsecond line"
doc.char_count; // Unicode scalar count of all text
doc.word_count; // 3 (Unicode word segmentation, block-aware)
# Ok::
text_content concatenates all descendant text with no separators (matches
ProseMirror's node.textContent). text_content_with_separator(sep) inserts
sep between adjacent block-level siblings (a node with content that isn't a
text node), so words don't merge across blocks. word_count uses
unicode-segmentation, so CJK
and complex scripts count correctly.
| Method | Returns |
|---|---|
text_content() |
String |
text_content_with_separator(sep) |
String |
char_count() |
usize |
word_count() |
usize |
Schema validation
The crate is schema-agnostic by default — validation is opt-in. A Schema
is an allow-list of node types, marks, attributes, and child types. validate
collects every problem in one pass (empty result = valid); each Violation
carries the offending node's index path (see Node paths).
use ;
let schema = new
.node
.node
.node
.node // marks live on text nodes
.mark
.mark;
let doc = from_json_str?;
assert!;
for v in doc.validate
# Ok::
Unset rules mean "anything goes": NodeSpec::new() allows any attrs/marks/children;
content/marks/attrs restrict only once set. required_attrs is always
enforced.
A schema can also be loaded from JSON:
use Schema;
let schema = from_json_str?;
# let _ = schema;
# Ok::
Violation::kind is a ViolationKind: MissingNodeType, UnknownNodeType,
DisallowedChild, InvalidContent, UnknownMark, DisallowedMark,
MissingAttr, UnknownAttr.
| Method | Returns |
|---|---|
validate(&schema) |
Vec<Violation> (empty = valid) |
is_valid(&schema) |
bool |
Content expressions
content as an array is a child-type set (any count/order). For
cardinality and ordering, use a ProseMirror content expression —
NodeSpec::content_match("…") in Rust, or a content string in JSON.
Nodes can declare groups that expressions reference by name:
use ;
let schema = new
.node
.node
.node;
// doc must be an optional heading followed by one-or-more block nodes
let bad = from_json_str?; // no block children
assert!; // -> ViolationKind::InvalidContent
# Ok::
Supported syntax: names (a node type or group), sequence (whitespace), |
(choice), grouping ( … ), and quantifiers * + ? {n} {n,} {n,m}
(numeric bounds capped at 1000). In JSON, "content": "paragraph+" is an
expression; "content": ["paragraph"] stays the array form. Invalid
expressions are reported when the schema is built/loaded.
Diffing
Compute a path-addressed list of [Change]s between two trees, and apply
it to reproduce the target. The change variants mirror the mutation API, so a
diff is a replayable patch — useful for change tracking, undo/redo, edit
persistence, and exact test assertions.
use ;
let a = from_json_str.unwrap;
let b = from_json_str.unwrap;
let changes = a.diff; // Vec<Change>: e.g. [SetText { path: [0,0], text: Some("bye") }]
let mut c = a.clone;
c.apply.unwrap; // reproduce `b`
assert_eq!;
The round-trip property apply(&mut a.clone(), &a.diff(b)) == b always holds.
Undo — invert produces the reverse change list, so a forward diff and its
inverse form an undo/redo pair:
let forward = a.diff;
let undo = a.invert.unwrap; // inverse relative to `a` (the pre-image)
let mut c = b.clone;
c.apply.unwrap;
assert_eq!; // restored
Change variants (path = the target node, except Insert/Remove whose
path is the parent + index):
| Variant | Meaning |
|---|---|
SetAttr / RemoveAttr |
attribute changed / removed |
SetText |
text payload set (None clears) |
SetMarks |
whole mark list replaced (None clears) |
SetExtra / RemoveExtra |
unknown top-level field changed / removed (lossless) |
Insert / Remove |
child inserted / removed at index |
Replace |
node replaced wholesale (its type changed) |
Move |
child relocated within its parent (from → to), no clone |
Change derives serde, so change lists round-trip through JSON.
Move detection — a child relocated within a list is emitted as a single
Move (no subtree clone) rather than a Remove + Insert. After LCS
alignment, leftover deletions and insertions that are equal by value are
paired as moves, and only the genuinely-relocated nodes are moved (matched
anchors stay put), so a drag past several siblings is one Move, and a shuffle
of distinct children is a list of Moves with no clones. invert needs no
special handling — it re-diffs the reverse direction.
v1 limitations: matching is LCS-by-equality; modifies are paired positionally within the gaps between matched anchors (still correct, just not always minimal).
Transactions
A Transform mutates the tree in place and records a replayable, invertible
Change log in the same pass — so instead of editing and then diffing to
recover a patch, you get the patch for free. Builder methods mirror the Change
variants and chain with ?.
use ;
let mut doc = element.with_children;
let original = doc.clone;
let changes = ;
// Replay the log onto a clone of the original to reproduce `doc`…
let mut replay = original.clone;
apply.unwrap;
assert_eq!;
// …and invert it for undo.
let undo = original.invert.unwrap;
# let _ = undo;
# Ok::
Methods: set_attr / remove_attr, set_text, set_marks, set_extra /
remove_extra, insert / remove / replace, move_child. Each returns
Result<&mut Self, ApplyError>; on an unresolvable path the transaction stops
with the edits recorded so far. changes() peeks at the log; finish() returns it.
Rendering to HTML
to_html renders a document to an HTML string with Tiptap-sensible, schema-agnostic
defaults. Output is compact and HTML-escaped (text and attribute values).
use Document;
let doc = from_json_str?;
assert_eq!;
# Ok::
Defaults: paragraph→<p>, heading→<h1>–<h6> (clamped), blockquote,
bulletList/orderedList/listItem→<ul>/<ol>/<li>, codeBlock→<pre><code>
(+language class), horizontalRule→<hr>, hardBreak→<br>, image→<img>;
marks bold→<strong>, italic→<em>, strike→<s>, code, underline→<u>,
subscript/superscript, link→<a>. A text node's marks nest in array order
(marks[0] outermost). paragraph/heading textAlign → style="text-align:…".
Customize with to_html_with(&HtmlOptions) — a plain data struct (no closures,
so it works over WASM/FFI): override/extend node & mark tag maps, choose the
unknown-node/mark policy (Transparent default, DataTypeDiv/DataMarkSpan, or
Skip), pick SelfClosingStyle (Html5/Xhtml), and opt into spread_attrs
(emit a node's remaining attributes — off by default; always escaped). In JS:
doc.toHTML() / doc.toHTMLWith({ selfClosing: "xhtml" }).
Security — escaping is not sanitization. Text and attribute values are HTML-escaped, which prevents markup break-out but not dangerous URLs or styles: a
linkhrefis emitted verbatim (sojavascript:…survives), andspread_attrs(off by default) emits attribute names verbatim (e.g.onclick).textAlignis whitelisted to the standard keywords. For untrusted documents, sanitize the rendered HTML (or the source URLs/attrs) yourself.
Building nodes
Constructors plus consuming with_* builder methods for fluent assembly.
use ;
// Leaf constructors
let plain = text;
let marked = text_with_marks;
// Element builder
let para = element
.with_attr
.with_mark
.with_text // adds a text child
.with_child; // adds an arbitrary child
// Mark builder
let link = new.attr;
// doc(..) helper for the root
let document = doc;
| Constructor / builder | Purpose |
|---|---|
Node::element(type) |
new element node of type |
Node::text(s) |
new text node |
Node::text_with_marks(s, marks) |
text node with marks |
doc(children) |
a doc root node |
Mark::new(type) / .attr(k, v) |
construct a mark |
.with_attr(k, v) |
set an attr (chaining) |
.with_child(node) / .with_children(iter) |
append child/children |
.with_text(s) |
append a text child |
.with_mark(mark) |
add a mark |
JavaScript / WASM
The crate ships WASM bindings on npm for browser/bundler apps:
import from "tiptap-rusty-parser";
const doc = ;
doc.; // "Title"
const = doc.; // [0]
doc.;
doc.;
doc.; // false
const json = doc.;
const htmlString = doc.; // render to HTML (or toHTMLWith(options))
// Diff two docs and apply the change list
const changes = doc.; // Change[] (tagged objects)
const undo = doc.; // reverse change list (undo)
doc.; // reproduce `other`
doc.; // back to the original
An opaque TiptapDoc handle keeps the tree in WASM; queries return cloned
nodes or number[] index paths, and mutation is path-addressed. Full method
list in bindings/wasm/README.md. Built for the
bundler target.
Error handling
Parsing/serialization returns Result<T, ParseError>:
ParseError implements std::error::Error (via thiserror) and From for both
underlying errors, so ? works directly. A Result<T> alias is also exported.
Performance
Borrow-first API, stack-based descendant iteration (no recursion blowup on deep
docs), serde_json for (de)serialization, and a release profile with
lto = true / codegen-units = 1.
Indicative criterion baselines on a synthetic doc of 500 paragraphs × 20 bold text spans (~10k text nodes, ~10.5k nodes total):
| Operation | Time |
|---|---|
parse (from JSON string) |
~14 ms |
serialize (to JSON string) |
~1.0 ms |
walk (count all nodes) |
~29 µs |
find_all (all text nodes) |
~108 µs |
replace_all (add a mark to every text node) |
~5.0 ms |
normalize (merge-heavy: 20 same-mark spans → 1 per paragraph) |
~1.4 ms |
normalize (already canonical, nothing to merge) |
~65 µs |
diff (500 paragraphs fully reordered → Move ops) |
~17 ms |
apply (the reorder change list) |
~2.7 ms |
add_mark_range (mark + re-merge a 5000-span block) |
~1.3 ms |
delete_range (drop 3000 spans from a block) |
~0.3 ms |
transform (record a 3-op transaction) |
~13 µs |
Run cargo bench to reproduce on your hardware.
Examples
Runnable end-to-end examples live in examples/:
Development
License
MIT