mdstitch
Streaming markdown preprocessor — auto-completes incomplete syntax during token-by-token streaming.
What it does
Partial markdown renders badly mid-stream. **bold shows two literal
asterisks, [link confuses inline parsers, an unterminated ``` fence
swallows every subsequent token as code. mdstitch runs before
pulldown-cmark on each accumulated chunk and closes unterminated
markers so every intermediate frame is well-formed CommonMark:
**bold→**bold**`code→`code`[text](http→[text](stitch:incomplete-link)$$\frac{a}{b}→$$\frac{a}{b}$$
When no changes are needed, stitch returns Cow::Borrowed — the zero-
allocation fast path — so streaming renderers can invoke it on every delta
without incurring a copy when the text is already closed.
Status
Published on crates.io. Consumed by tahoe-gpui for
streaming Markdown rendering.
Usage
[]
= "0.1"
use ;
let partial = "Hello **wor";
let completed = stitch;
assert_eq!;
Inside tahoe-gpui, the incremental parser opts in with
with_stitch (see crates/tahoe-gpui/src/markdown/parser/mod.rs:69):
use StitchOptions;
use IncrementalMarkdownParser;
let mut parser = with_stitch;
parser.push_delta;
let blocks = parser.parse; // parses "# Hello **wor**"
Built-in handlers
Handlers run in priority order (lower first). Every option defaults to true
except inline_katex, which is off because a lone $ is ambiguous with
currency.
| Option | Priority | Completes / handles | Default |
|---|---|---|---|
single_tilde |
0 | Escapes a lone ~ between word characters |
on |
comparison_operators |
5 | Escapes > at the start of list items so it doesn't parse as a blockquote |
on |
html_tags |
10 | Strips an incomplete trailing HTML tag | on |
setext_headings |
15 | Prevents a trailing === / --- line from being misread as a setext underline |
on |
links / images |
20 | [text](url → [text](stitch:incomplete-link) (see LinkMode) |
on |
bold_italic |
30 | ***x → ***x*** |
on |
bold |
35 | **x → **x** |
on |
italic |
40–42 | __x / *x / _x → closed |
on |
inline_code |
50 | `x → `x` |
on |
strikethrough |
60 | ~~x → ~~x~~ |
on |
katex |
70 | $$eq → $$eq$$ |
on |
inline_katex |
75 | $eq → $eq$ |
off |
Priorities are re-exported as constants in mdstitch::priority
so custom handlers can slot between the built-ins.
LinkMode
Controls what happens when an incomplete [text](url… is detected:
LinkMode::Protocol(default) — rewrite to[text](stitch:incomplete-link). Lets the downstream renderer keep the link text visible; the sentinel URL can be detected to style it as pending.LinkMode::TextOnly— drop the link markup entirely and render only the text.
Custom handlers
Implement StitchHandler and register with StitchOptions::handler:
use Cow;
use ;
;
let opts = default.handler;
let _ = stitch;
Handler authors can reuse mdstitch's own scanning helpers so they honour the
same code-block and link boundaries as the built-ins:
is_inside_code_block(text, pos)is_within_link_or_image_url(text, pos)is_within_math_block(text, pos)is_word_char(ch)
For repeated queries on the same input, share a CodeBlockRanges
instead — it scans once in O(n) and answers subsequent checks in O(log n).
Secondary utilities
mdstitch also exposes helpers that tahoe-gpui uses outside the auto-completion
pipeline:
has_incomplete_code_fence(&str) -> bool— walks lines per CommonMark §4.5 to detect an unclosed fence. Used byIncrementalMarkdownParserto gate code-block styling mid-stream.has_table(&str) -> bool— detects a GFM table delimiter row (| --- |).detect_text_direction(&str) -> TextDirection— first-strong-character Unicode heuristic, returnsLtrorRtl. Skips common markdown syntax (headings, emphasis, inline code, links) before sampling.preprocess_custom_tags(markdown, &[tag])— replaces\n\ninside a named HTML tag with an<!---->spacer so blank lines don't split the CommonMark block.preprocess_literal_tag_content(markdown, &[tag])— escapes markdown metacharacters inside chosen tags so their body renders as literal text.normalize_html_indentation(&str) -> Cow<'_, str>— dedents leading whitespace that would otherwise make a tag look like an indented code block.
Module layout
src/
├── lib.rs # entry point, pipeline orchestration, re-exports
├── options.rs # StitchOptions, StitchHandler, LinkMode, priority::*
├── ranges.rs # CodeBlockRanges — shared range index
├── fence.rs # CommonMark §4.5 fence/inline-code scanner
├── bracket.rs # balanced [ / ] matcher (respects code spans)
├── utils.rs # shared predicates (is_word_char, is_escaped, …)
│
│ # One handler per marker class:
├── emphasis.rs # ** *** __ and the three italic variants
├── inline_code.rs # `…`
├── strikethrough.rs # ~~…~~
├── single_tilde.rs # lone ~ escaping
├── link_image.rs # [text](url,  pipeline):
├── detect_direction.rs # RTL/LTR detection
├── incomplete_code.rs # has_incomplete_code_fence, has_table
├── preprocess.rs # custom / literal HTML tag handling
│
└── tests.rs # unit tests + proptest fuzzers
Testing
Use cargo nextest, not cargo test — the workspace .config/nextest.toml
tunes parallelism and retries.
The test suite exercises every built-in handler in isolation plus a
proptest! block that fuzzes:
- Arbitrary UTF-8 never panics.
- Every streaming prefix of arbitrary UTF-8 never panics (each cut on a char
boundary is
stitch'd). - Idempotency across every option combination:
stitch(stitch(x)) == stitch(x). - Custom-handler order matches the priority sort.
Proptest regressions are committed under proptest-regressions/tests.txt.
License
Apache-2.0. See LICENSE at the workspace root.