Expand description
§markdown-syntax
A no_std + alloc Rust crate that parses Markdown source into an owned AST and serializes the AST back to canonical Markdown — with opt-in, safe-by-default HTML rendering behind the html feature.
§At a glance
- AST-first —
parsereturns an owned enum tree overalloc; the output verbs live on theDocumentyou hold. - Tolerant — problems are collected as diagnostics, never thrown;
parseis infallible. - Maximal default dialect — GFM + footnotes + math + frontmatter + wikilinks + directives + extra inline marks, out of the box.
- Lean core — zero runtime dependencies,
no_std + alloc, MSRV 1.82.
§Install
cargo add markdown-syntaxFor the opt-in HTML renderer:
cargo add markdown-syntax --features htmlOr in Cargo.toml:
[dependencies]
markdown-syntax = "0.1"§Quickstart
use markdown_syntax::parse;
// `parse` is infallible and returns a `ParseOutput { document, diagnostics }`.
let output = parse("# Title\n\nHello *world*.");
assert!(output.diagnostics.is_empty());
// Serialize the AST back to canonical Markdown (this is the fallible step).
let markdown = output.document.to_markdown()?;
assert_eq!(markdown, "# Title\n\nHello *world*.\n");parse is infallible — the output verbs live on the Document you hold.
§Common tasks
parse is the one obvious path. When you need to narrow the dialect, read diagnostics, walk the tree, or render HTML, each task is one small snippet below.
- Pick a dialect (presets)
- Tune one construct (builder)
- Walk the AST
- Handle diagnostics (tolerant vs strict)
- Customize serialization
- Source positions (optional)
- Build an AST by hand
§Pick a dialect (presets)
use markdown_syntax::SyntaxOptions;
// Named presets each build a `SyntaxOptions`; call `.parse` to run them.
let cm = SyntaxOptions::commonmark().parse("~~kept literal~~");
let gfm = SyntaxOptions::gfm().parse("~~done~~ and https://example.com");
let mdx = SyntaxOptions::mdx().parse("<Component/>\n\ntext");
// `parse(input)` is exactly `SyntaxOptions::default().parse(input)` — the
// maximal non-MDX dialect.
let default = SyntaxOptions::default().parse("H~2~O and x^2^");
let _ = (cm, gfm, mdx, default);commonmark / gfm / mdx are the named presets; default == the maximal non-MDX dialect, and parse(input) is sugar for SyntaxOptions::default().parse(input). See SyntaxOptions.
§Tune one construct (builder)
use markdown_syntax::{SyntaxOptions, Construct, WikiLinkOrder};
// Tune a preset with the typo-proof `Construct` builder (grouped constructs
// such as `Math`, `Footnotes`, `Directives` flip every flag in the group).
let no_math = SyntaxOptions::default().disable(Construct::Math).parse("price $5");
let with_wikilinks = SyntaxOptions::commonmark()
.enable(Construct::Strikethrough)
.enable(Construct::Wikilinks(WikiLinkOrder::TitleAfterPipe))
.parse("~~old~~ see [[target|label]]");
let _ = (no_math, with_wikilinks);Construct is a typo-proof front door over the full Constructs flag set. Grouped constructs (Math, Footnotes, Directives) flip a whole family at once, and Wikilinks is the one parameterized variant.
§Walk the AST
use markdown_syntax::{parse, Block, Inline};
let document = parse("Hello *world*.").document;
for block in &document.children {
if let Block::Paragraph(paragraph) = block {
for inline in ¶graph.children {
if let Inline::Text(text) = inline {
assert_eq!(text.value, "Hello ");
break;
}
}
}
}document.children is a Vec<Block>; block content (like Paragraph.children) is a Vec<Inline>. See the ast module, Block, and Inline.
§Handle diagnostics (tolerant vs strict)
use markdown_syntax::{SyntaxOptions, DiagnosticSeverity, ParseStrictError};
// Tolerant parse: problems are collected, never thrown.
let output = SyntaxOptions::default().parse(":::note\nunclosed container");
for diagnostic in &output.diagnostics {
let _ = (diagnostic.severity, diagnostic.code, diagnostic.span, &diagnostic.message);
if diagnostic.severity == DiagnosticSeverity::Error {
// handle an error-severity diagnostic
}
}
// `parse_strict` promotes any error-severity diagnostic (or a config conflict)
// to a hard `Err`.
match SyntaxOptions::default().parse_strict("# clean input") {
Ok(out) => assert!(out.diagnostics.iter().all(|d| d.severity != DiagnosticSeverity::Error)),
Err(ParseStrictError::Config(_)) => {}
Err(ParseStrictError::Diagnostic(_)) => {}
}span is Option<Span> because a hand-built node may lack a source location. Parser diagnostics, AST validation, and serializer/HTML pre-validation are three separate domains that share one Diagnostic type.
§Customize serialization
use markdown_syntax::{parse, SerializeOptions, LineEnding};
// `SerializeOptions` is #[non_exhaustive]: mutate a default rather than using a
// struct literal.
let mut options = SerializeOptions::default();
options.line_ending = LineEnding::CrLf;
options.final_newline = false;
let markdown = parse("# Title").document.to_markdown_with(&options)?;
assert_eq!(markdown, "# Title");Because SerializeOptions is #[non_exhaustive], external code cannot struct-literal-construct it (even with ..Default::default(), E0639) — mutate a default() instead.
§Source positions (optional)
use markdown_syntax::{parse, LineIndex};
let source = "# Title\n\nHello.";
let document = parse(source).document;
let index = LineIndex::new(source);
// Spans are absolute, half-open UTF-8 byte ranges; `None` for hand-built nodes.
if let Some(first) = document.children.first() {
if let Some(span) = first.span() {
let (start, end) = index.span(span);
// 1-based line/column.
assert_eq!(start.line, 1);
assert_eq!(start.column, 1);
let _ = (span.start, span.end, span.len(), end.line, end.column);
}
}Spans are absolute half-open UTF-8 byte ranges, None for hand-built nodes. LineIndex turns a Span into 1-based LinePosition line/column.
§Build an AST by hand
The prelude imports the common surface in one line:
use markdown_syntax::prelude::*;
let document = Document {
meta: NodeMeta::default(),
children: vec![
Heading::new(1, [Text::from("Title")]).into(),
Paragraph::new([Text::from("hello")]).into(),
],
};
// Hand-built nodes carry no span.
assert_eq!(document.children[0].span(), None);
assert_eq!(document.to_markdown().unwrap(), "# Title\n\nhello\n");§HTML rendering (opt-in)
The HTML renderer ships behind the non-default html feature and is safe by default: it validates the AST first, escapes raw HTML, blanks dangerous link/image protocols, and disables task-list checkboxes.
cargo add markdown-syntax --features html// Requires `--features html`; the default doctest build has no html feature,
// so this block is `rust,ignore`.
use markdown_syntax::{parse, HtmlOptions, HtmlError, SafeRawHtmlForm};
let document = parse("# Hi\n\n<script>alert(1)</script>").document;
// Default is safe: raw HTML is escaped, dangerous link/image protocols blanked.
let safe: Result<String, HtmlError> = document.to_html();
assert!(safe.is_ok());
// `HtmlOptions` is #[non_exhaustive]: mutate a default to opt into raw HTML.
let mut options = HtmlOptions::default();
options.allow_dangerous_html = true;
options.safe_raw_html_form = SafeRawHtmlForm::OmitPlaceholder;
let _ = document.to_html_with(&options);See HtmlOptions. docs.rs builds with the html feature enabled, so the renderer’s API is fully documented there.
§Dialects & constructs reference
| Preset | .parse builder | Membership note |
|---|---|---|
commonmark | SyntaxOptions::commonmark() | CommonMark core only |
gfm | SyntaxOptions::gfm() | CommonMark + tables, task lists, strikethrough, autolinks, footnotes |
mdx | SyntaxOptions::mdx() | MDX JSX/expressions/ESM on; raw HTML off |
default (== max) | SyntaxOptions::default() / parse | Maximal non-MDX dialect (see below) |
underline (__text__) is off in default because it would override CommonMark strong; MDX is off by default and conflicts with raw HTML; wikilinks default to title-after-pipe. For the full Construct (~21 variants) and Constructs (~33 fields) surface, see Construct and Constructs on docs.rs.
Cargo features:
| Feature | Default | What it adds |
|---|---|---|
default | [] (empty) | Byte-stable no_std + alloc core: parser, AST, serializer, validation, Span/LineIndex, prelude. Zero runtime deps. |
html | off | Opt-in, additive, safe-by-default to_html / to_html_with and the html module. Stays no_std + alloc, zero runtime deps. |
§How it works
- AST-first public API —
parseproduces an ownedDocument; parser event streams and internal block operations are private, not v1 compatibility surfaces. - Owned enum tree over
alloctypes. - Optional source spans — half-open absolute byte ranges on every node,
Nonefor hand-built nodes; line/column derived viaLineIndex. - Tolerant by default — diagnostics are collected, not thrown.
§Scope & limitations
In scope — the maximal default dialect: GFM (tables, task lists, strikethrough, literal/relaxed autolinks, alerts), footnotes (incl. inline), inline + block math, frontmatter (--- / +++), wikilinks (title-after-pipe default), the extra inline marks (insert ++, highlight ==, subscript ~, superscript ^, spoiler ||, shortcodes :tada:), description lists, and the :name / ::name / :::name directive family.
Non-goals:
underline(__text__) is off by default — it would override CommonMark strong.- MDX (JSX / expressions / ESM) is off by default and conflicts with raw HTML.
- Raw HTML and MDX are represented only as Markdown syntax nodes — no HTML rendering/sanitization, no MDX evaluation, no syntax highlighting, and no DOM post-processing in the default build.
- The serializer performs no HTML safety filtering and does not preserve byte-for-byte authoring style from a bare AST.
- Validation is conservative and does not prove every semantic invariant of a hand-written AST.
- Directives (
:name/::name/:::name) are a distinct family and are never MDX.
§Compatibility
no_std + alloc (crate root is #![no_std] + extern crate alloc). Default features are empty; the opt-in html feature also stays no_std + alloc. Zero runtime dependencies. MSRV 1.82 (edition 2021).
§Contributing & conformance
Tests live in tests/. AST→HTML correctness is measured against vendored CommonMark/GFM oracles; observe the current numbers with cargo test --features html --test html_conformance -- --nocapture.
§License
Licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
§Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Re-exports§
pub use diagnostic::Diagnostic;pub use diagnostic::DiagnosticCode;pub use diagnostic::DiagnosticSeverity;pub use html::HtmlError;pub use html::HtmlOptions;pub use html::SafeRawHtmlForm;pub use html::TasklistAttrOrder;pub use options::Construct;pub use options::Constructs;pub use options::ParseOptions;pub use options::SyntaxConfigError;pub use options::SyntaxOptions;pub use options::WikiLinkOrder;pub use parse::parse;pub use parse::ParseOutput;pub use parse::ParseStrictError;pub use serialize::LineEnding;pub use serialize::SerializeError;pub use serialize::SerializeOptions;pub use span::LineIndex;pub use span::LinePosition;pub use span::Span;pub use ast::*;
Modules§
- ast
- The owned Markdown AST that
parse()produces andDocument::to_markdown/to_htmlconsume. Every node carries aNodeMetawith an optional sourceSpan.BlockandInlineare the two node enums; everything else is a concrete node struct or a small enum describing a node’s variant. - diagnostic
- The unified
Diagnostictype shared by the parser, AST validation, and the serialize/HTML pre-validation. - html
- AST to HTML rendering.
- options
- Parser configuration: which Markdown constructs are recognized and how.
- parse
- Markdown source to AST. The entry points are the free
parsefunction (maximal default dialect) and theSyntaxOptions::parse/SyntaxOptions::parse_strictmethods. Parsing is tolerant: problems are collected asDiagnostics rather than aborting. - prelude
- Common imports for working with
markdown-syntax:use markdown_syntax::prelude::*;brings the AST, options, diagnostics, parse entry points, and serialize/span types (plus the HTML renderer under thehtmlfeature) into scope. - serialize
- AST to canonical Markdown. The verbs live on
Document(to_markdown/to_markdown_with);SerializeOptionstunes the output style. The document is validated first, so serialization can fail with aSerializeError. - span
- Source locations: byte-offset
Spans and theirLineIndextranslation into human-readable line/columnLinePositions. - validate
- AST validation:
Document::validatewalks the tree and reports each invalid or unsupported node shape as aDiagnostic. Serialization and HTML rendering run this first and refuse an invalid document.