Expand description
Parse a Markdown input string into a sequence of Markdown abstract syntax
tree Blocks.
This crate is intentionally designed to interoperate well with the
pulldown-cmark crate and the
ecosystem around it. See Motivation and relation to pulldown-cmark
for more information.
The AST types are designed to align with the structure defined by the CommonMark Specification.
§Quick Examples
Parse simple Markdown into an AST:
use markdown_ast::{markdown_to_ast, Block, Inline, Inlines};
let ast = markdown_to_ast("
Hello! This is a paragraph **with bold text**.
");
assert_eq!(ast, vec![
Block::Paragraph(Inlines(vec![
Inline::Text("Hello! This is a paragraph ".to_owned()),
Inline::Strong(Inlines(vec![
Inline::Text("with bold text".to_owned()),
])),
Inline::Text(".".to_owned())
]))
]);§API Overview
| Function | Input | Output |
|---|---|---|
markdown_to_ast() | &str | Vec<Block> |
ast_to_markdown() | &[Block] | String |
ast_to_events() | &[Block] | Vec<Event> |
events_to_ast() | &[Event] | Vec<Block> |
events_to_markdown() | &[Event] | String |
markdown_to_events() | &str | Vec<Event> |
canonicalize() | &str | String |
§Terminology
This crate is able to process and manipulate Markdown in three different representations:
| Term | Type | Description |
|---|---|---|
| Markdown | String | Raw Markdown source / output string |
| Events | &[Event] | Markdown parsed by pulldown-cmark into a flat sequence of parser Events |
| AST | Block / &[Block] | Markdown parsed by markdown-ast into a hierarchical structure of Blocks |
§Processing Steps
String => Events => Blocks => Events => String
|_____ A ______| |______ C _____|
|______ B _____| |______ D _____|
|__________ E ___________|
|___________ F __________|
|____________________ G _____________________|- A —
markdown_to_events() - B —
events_to_ast() - C —
ast_to_events() - D —
events_to_markdown() - E —
markdown_to_ast() - F —
ast_to_markdown() - G —
canonicalize()
Note: A wraps pulldown_cmark::Parser, and D wraps
pulldown_cmark_to_cmark::cmark().
§Detailed Examples
§Parse varied Markdown to an AST representation:
use markdown_ast::{
markdown_to_ast, Block, HeadingLevel, Inline, Inlines, ListItem
};
let ast = markdown_to_ast("
# An Example Document
This is a paragraph that
is split across *multiple* lines.
* This is a list item
");
assert_eq!(ast, vec![
Block::Heading(
HeadingLevel::H1,
Inlines(vec![
Inline::Text("An Example Document".to_owned())
])
),
Block::Paragraph(Inlines(vec![
Inline::Text("This is a paragraph that".to_owned()),
Inline::SoftBreak,
Inline::Text("is split across ".to_owned()),
Inline::Emphasis(Inlines(vec![
Inline::Text("multiple".to_owned()),
])),
Inline::Text(" lines.".to_owned()),
])),
Block::List(vec![
ListItem(vec![
Block::Paragraph(Inlines(vec![
Inline::Text("This is a list item".to_owned())
]))
])
])
]);§Synthesize Markdown using programmatic construction of the document:
Note: This is a more user friendly alternative to a “string builder” approach where the raw Markdown string is constructed piece by piece, which suffers from extra bookkeeping that must be done to manage things like indent level and soft vs hard breaks.
use markdown_ast::{
ast_to_markdown, Block, Inline, Inlines, ListItem,
HeadingLevel,
};
let tech_companies = vec![
("Apple", 1976, 164_000),
("Microsoft", 1975, 221_000),
("Nvidia", 1993, 29_600),
];
let ast = vec![
Block::Heading(HeadingLevel::H1, Inlines::plain_text("Tech Companies")),
Block::plain_text_paragraph("The following are major tech companies:"),
Block::List(Vec::from_iter(
tech_companies
.into_iter()
.map(|(company_name, founded, employee_count)| {
ListItem(vec![
Block::paragraph(vec![Inline::plain_text(company_name)]),
Block::List(vec![
ListItem::plain_text(format!("Founded: {founded}")),
ListItem::plain_text(format!("Employee count: {employee_count}"))
])
])
})
))
];
let markdown: String = ast_to_markdown(&ast);
assert_eq!(markdown, "\
# Tech Companies
The following are major tech companies:
* Apple
* Founded: 1976
* Employee count: 164000
* Microsoft
* Founded: 1975
* Employee count: 221000
* Nvidia
* Founded: 1993
* Employee count: 29600\
");
§Known Issues
Currently markdown-ast does not escape Markdown content appearing in
leaf inline text:
use markdown_ast::{ast_to_markdown, Block};
let ast = vec![
Block::plain_text_paragraph("In the equation a*b*c ...")
];
let markdown = ast_to_markdown(&ast);
assert_eq!(markdown, "In the equation a*b*c ...");which will render as:
In the equation abc …
with the asterisks interpreted as emphasis formatting markers, contrary to the intention of the author.
Fixing this robustly will require either:
-
Adding automatic escaping of Markdown characters in
Inline::Textduring rendering (not ideal) -
Adding pre-construction validation checks for
Inline::Textthat prevent constructing anInlinewith Markdown formatting characters that have not been escaped correctly by the user.
In either case, fixing this bug will be considered a semver exempt
change in behavior to markdown-ast.
§Motivation and relation to pulldown-cmark
pulldown-cmark is a popular
Markdown parser crate. It provides a streaming event (pull parsing) based
representation of a Markdown document. That representation is useful for
efficient transformation of a Markdown document into another format, often
HTML.
However, a streaming parser representation is less amenable to programmatic construction or human-understandable transformations of Markdown documents.
markdown-ast provides a abstract syntax tree (AST) representation of
Markdown that is easy to construct and work with.
Additionally, pulldown-cmark is widely used in the Rust crate ecosystem,
for example for mdbook extensions.
Interoperability with pulldown-cmark is an intentional design choice for
the usability of markdown-ast; one could imagine markdown-ast instead
abstracting over the underlying parser implementation, but my view is that
would limit the utility of markdown-ast.
Structs§
- Inlines
- A sequence of
Inlines. (CommonMark: inlines) - List
Item - An item in a list. (CommonMark: list items)
Enums§
- Block
- A piece of structural Markdown content. (CommonMark: blocks, container blocks)
- Code
Block Kind - Heading
Level - Inline
- An inline piece of atomic Markdown content. (CommonMark: inlines)
Functions§
- ast_
to_ events - Convert AST
Blocks into anEventsequence. - ast_
to_ markdown - Convert AST
Blocks into a Markdown string. - canonicalize
- Canonicalize (or format) a Markdown input by parsing and then converting back to a string.
- events_
to_ ast - Parse
Events into ASTBlocks. - events_
to_ markdown - Convert
Events into a Markdown string. - markdown_
to_ ast - Parse Markdown input string into AST
Blocks. - markdown_
to_ events - Parse Markdown input string into
Events.