tiptap-rusty-parser
Fast, schema-agnostic parser & manipulator for Tiptap /
ProseMirror JSONContent documents, in Rust.
- Schema-agnostic — any node/mark
typeis accepted; unknown JSON fields are preserved for lossless roundtrip. - Query with predicate closures —
find,find_all,walk,descendants. - Select by type/mark/attr —
by_type,by_mark,by_attr. - Address by index path —
node_at,path_to,paths_to. - Mutate in place — marks, attrs, children, text, and bulk
replace_all. - Extract text —
text_content,char_count,word_count(Unicode-aware). - Validate (opt-in) — check against an allow-list
Schema(Rust or JSON). - Build ergonomically —
Node::element,Node::text,doc(..),with_*chaining. - JS/WASM —
npm i tiptap-rusty-parserfor browser/bundler apps (see JavaScript / WASM). - Fast — borrow over copy, stack-based traversal (no recursion blowup),
ltorelease profile, criterion benches.
Table of contents
- Install
- Quick start
- Data model
- Parsing & serializing
- Querying
- Selectors
- Node paths
- Mutating
- Text extraction
- Schema validation
- Diffing
- Building nodes
- JavaScript / WASM
- Error handling
- Performance
- Development
- License
Install
Add to Cargo.toml:
[]
= "0.1"
Requires a recent stable Rust (edition 2021).
Quick start
use ;
Data model
A Tiptap document is a tree of nodes. The crate mirrors Tiptap's JSONContent
shape directly.
Map/Value are re-exported from serde_json. The crate is built with the
preserve_order feature so attribute key order survives a roundtrip.
Why everything is Option — to faithfully distinguish missing from
empty. content: None serializes to no content key; content: Some(vec![])
serializes to "content": []. Unknown node types (custom Tiptap extensions) and
unknown fields land in extra and roundtrip untouched.
Document is a thin owning wrapper around the root Node and derefs to it,
so every Node method below is also callable directly on a Document.
Parsing & serializing
use Document;
use json;
// From a JSON string
let doc = from_json_str?;
// From a serde_json::Value
let doc = from_value?;
// From any reader (file, socket, …)
let file = open?;
let doc = from_reader?;
// Serialize
let compact = doc.to_json_str?; // String, compact
let pretty = doc.to_string_pretty?; // String, indented
let value = doc.to_value?; // serde_json::Value
# Ok::
Roundtrip is lossless — unknown node types, extra fields, and key order are all preserved.
Access the root node explicitly when needed: doc.root(), doc.root_mut(),
doc.into_root(). Wrap an existing node with Document::new(node) or
node.into().
Querying
All traversal is depth-first pre-order (a node is visited before its children). Selection is done with predicate closures — no selector DSL to learn.
use ;
let doc = from_json_str?;
// First match (incl. the node itself)
let first_para: = doc.find;
// All matches
let texts: = doc.find_all;
assert_eq!;
// Predicate can inspect anything: marks, attrs, text…
let bold = doc.find.unwrap;
assert_eq!;
// Lazy iterator over self + all descendants
let count = doc.descendants.count;
// Visit every node
let mut n = 0;
doc.walk;
# Ok::
Mutable variants return &mut access:
# use ;
# let mut doc = from_json_str?;
// Single mutable match
if let Some = doc.find_mut
// All mutable matches (predicate passed by &mut)
let mut is_text = ;
for node in doc.root_mut.find_all_mut
// In-place visit
doc.walk_mut;
# Ok::
| Method | Signature | Returns |
|---|---|---|
find |
find(|&Node| -> bool) |
Option<&Node> |
find_mut |
find_mut(|&Node| -> bool) |
Option<&mut Node> |
find_all |
find_all(|&Node| -> bool) |
Vec<&Node> |
find_all_mut |
find_all_mut(&mut |&Node| -> bool) |
Vec<&mut Node> |
walk |
walk(&mut |&Node|) |
() |
walk_mut |
walk_mut(&mut |&mut Node|) |
() |
descendants |
descendants() |
impl Iterator<Item = &Node> |
Selectors
Convenience wrappers over the closure API for the common cases — no closure to write, and a friendlier surface for future CLI/FFI layers.
use Document;
let doc = from_json_str?;
doc.by_type; // -> Vec<&Node>
doc.first_by_type; // -> Option<&Node>
doc.by_mark; // -> Vec<&Node> (nodes carrying the mark)
doc.by_attr; // -> Vec<&Node> (attr equals value)
// mutable
# let mut doc = doc;
for n in doc.root_mut.by_type_mut
# Ok::
| Method | Returns |
|---|---|
by_type(t) / first_by_type(t) / by_type_mut(t) |
Vec<&Node> / Option<&Node> / Vec<&mut Node> |
by_mark(mark_type) |
Vec<&Node> |
by_attr(key, value) |
Vec<&Node> |
Node paths
Address nodes by index path — a slice of child indices, root = &[]. In a
doc → paragraph → text tree the text node is at &[0, 0]. There are no parent
pointers; parent/sibling navigation is just path slicing.
use Document;
let mut doc = from_json_str?;
doc.node_at; // -> Option<&Node> (the "b" text node)
doc.node_at_mut.unwrap.set_text;
let p = doc.path_to.unwrap; // -> vec![0, 1]
let parent = doc.node_at.unwrap; // its paragraph
doc.paths_to; // every text location
# Ok::
| Method | Returns |
|---|---|
node_at(path) / node_at_mut(path) |
Option<&Node> / Option<&mut Node> |
path_to(pred) |
Option<Vec<usize>> (first match, pre-order) |
paths_to(pred) |
Vec<Vec<usize>> (all matches) |
Mutating
Mutation is in place on a &mut Node / &mut Document — no copies, no
rebuild. Container fields auto-collapse to None when they become empty (e.g.
removing the last mark sets marks back to None), keeping output clean.
Marks
use ;
let mut t = text;
t.add_mark; // -> true (added)
t.add_mark; // -> false (already present; deduped)
t.has_mark; // -> true
t.get_mark; // -> Option<&Mark>
t.toggle_mark; // add if absent, remove if present
t.set_mark_attr; // set attr on an existing mark
t.remove_mark; // -> usize (count removed)
t.clear_marks; // drop all marks
Attributes
use Node;
use json;
let mut h = element;
h.set_attr; // -> previous value, if any
h.attr; // -> Option<&Value> => Some(&json!(2))
h.attrs_mut.insert; // raw map access
h.remove_attr; // -> Option<Value>
Children
use Node;
let mut p = element;
p.push_child;
p.push_child;
p.insert_child; // index clamped to len
p.child_count; // -> 3
p.child; // -> Option<&Node>
p.child_mut; // -> Option<&mut Node>
p.children; // -> &[Node]
p.children_mut; // -> &mut Vec<Node> (creates if absent)
p.replace_child; // -> Option<Node> (old)
p.remove_child; // -> Option<Node> (removed)
p.retain_children; // filter in place
p.clear_children; // remove all
Text
# use Node;
let mut t = text;
t.get_text; // -> Some("old")
t.set_text;
Bulk transforms
replace_all walks the whole subtree, applying a mutation to every node that
matches a predicate, and returns how many were changed.
use ;
let mut doc = from_json_str?;
let changed = doc.replace_all;
assert_eq!;
# Ok::
Text extraction
use Document;
let doc = from_json_str?;
doc.text_content; // "Hello worldsecond line" (ProseMirror semantics)
doc.text_content_with_separator; // "Hello world\n\nsecond line"
doc.char_count; // Unicode scalar count of all text
doc.word_count; // 3 (Unicode word segmentation, block-aware)
# Ok::
text_content concatenates all descendant text with no separators (matches
ProseMirror's node.textContent). text_content_with_separator(sep) inserts
sep between adjacent block-level siblings (a node with content that isn't a
text node), so words don't merge across blocks. word_count uses
unicode-segmentation, so CJK
and complex scripts count correctly.
| Method | Returns |
|---|---|
text_content() |
String |
text_content_with_separator(sep) |
String |
char_count() |
usize |
word_count() |
usize |
Schema validation
The crate is schema-agnostic by default — validation is opt-in. A Schema
is an allow-list of node types, marks, attributes, and child types. validate
collects every problem in one pass (empty result = valid); each Violation
carries the offending node's index path (see Node paths).
use ;
let schema = new
.node
.node
.node
.node // marks live on text nodes
.mark
.mark;
let doc = from_json_str?;
assert!;
for v in doc.validate
# Ok::
Unset rules mean "anything goes": NodeSpec::new() allows any attrs/marks/children;
content/marks/attrs restrict only once set. required_attrs is always
enforced.
A schema can also be loaded from JSON:
use Schema;
let schema = from_json_str?;
# let _ = schema;
# Ok::
Violation::kind is a ViolationKind: MissingNodeType, UnknownNodeType,
DisallowedChild, UnknownMark, DisallowedMark, MissingAttr, UnknownAttr.
| Method | Returns |
|---|---|
validate(&schema) |
Vec<Violation> (empty = valid) |
is_valid(&schema) |
bool |
Diffing
Compute a path-addressed list of [Change]s between two trees, and apply
it to reproduce the target. The change variants mirror the mutation API, so a
diff is a replayable patch — useful for change tracking, undo/redo, edit
persistence, and exact test assertions.
use ;
let a = from_json_str.unwrap;
let b = from_json_str.unwrap;
let changes = a.diff; // Vec<Change>: e.g. [SetText { path: [0,0], text: Some("bye") }]
let mut c = a.clone;
c.apply.unwrap; // reproduce `b`
assert_eq!;
The round-trip property apply(&mut a.clone(), &a.diff(b)) == b always holds.
Change variants (path = the target node, except Insert/Remove whose
path is the parent + index):
| Variant | Meaning |
|---|---|
SetAttr / RemoveAttr |
attribute changed / removed |
SetText |
text payload set (None clears) |
SetMarks |
whole mark list replaced (None clears) |
SetExtra / RemoveExtra |
unknown top-level field changed / removed (lossless) |
Insert / Remove |
child inserted / removed at index |
Replace |
node replaced wholesale (its type changed) |
Change derives serde, so change lists round-trip through JSON.
v1 limitations: no move detection (a relocated child is emitted as
Remove + Insert); child matching is LCS-by-equality, so pathological
reorders degrade to remove+insert (still correct, just not minimal).
Building nodes
Constructors plus consuming with_* builder methods for fluent assembly.
use ;
// Leaf constructors
let plain = text;
let marked = text_with_marks;
// Element builder
let para = element
.with_attr
.with_mark
.with_text // adds a text child
.with_child; // adds an arbitrary child
// Mark builder
let link = new.attr;
// doc(..) helper for the root
let document = doc;
| Constructor / builder | Purpose |
|---|---|
Node::element(type) |
new element node of type |
Node::text(s) |
new text node |
Node::text_with_marks(s, marks) |
text node with marks |
doc(children) |
a doc root node |
Mark::new(type) / .attr(k, v) |
construct a mark |
.with_attr(k, v) |
set an attr (chaining) |
.with_child(node) / .with_children(iter) |
append child/children |
.with_text(s) |
append a text child |
.with_mark(mark) |
add a mark |
JavaScript / WASM
The crate ships WASM bindings on npm for browser/bundler apps:
import from "tiptap-rusty-parser";
const doc = ;
doc.; // "Title"
const = doc.; // [0]
doc.;
doc.;
doc.; // false
const json = doc.;
// Diff two docs and apply the change list
const changes = doc.; // Change[] (tagged objects)
doc.; // reproduce `other`
An opaque TiptapDoc handle keeps the tree in WASM; queries return cloned
nodes or number[] index paths, and mutation is path-addressed. Full method
list in bindings/wasm/README.md. Built for the
bundler target.
Error handling
Parsing/serialization returns Result<T, ParseError>:
ParseError implements std::error::Error (via thiserror) and From for both
underlying errors, so ? works directly. A Result<T> alias is also exported.
Performance
Borrow-first API, stack-based descendant iteration (no recursion blowup on deep
docs), serde_json for (de)serialization, and a release profile with
lto = true / codegen-units = 1.
Indicative criterion baselines on a synthetic doc of 500 paragraphs × 20 bold text spans (~10k text nodes, ~10.5k nodes total):
| Operation | Time |
|---|---|
parse (from JSON string) |
~14 ms |
serialize (to JSON string) |
~1.0 ms |
walk (count all nodes) |
~29 µs |
find_all (all text nodes) |
~108 µs |
replace_all (add a mark to every text node) |
~5.0 ms |
Run cargo bench to reproduce on your hardware.
Development
License
MIT