Expand description
Parse a Markdown input string into a sequence of Markdown abstract syntax
tree Block
s.
This crate is intentionally designed to interoperate well with the
pulldown-cmark
crate and the
ecosystem around it. See Motivation and relation to pulldown-cmark
for more information.
The AST types are designed to align with the structure defined by the CommonMark Specification.
§Quick Examples
Parse simple Markdown into an AST:
use markdown_ast::{markdown_to_ast, Block, Inline, Inlines};
let ast = markdown_to_ast("
Hello! This is a paragraph **with bold text**.
");
assert_eq!(ast, vec![
Block::Paragraph(Inlines(vec![
Inline::Text("Hello! This is a paragraph ".to_owned()),
Inline::Strong(Inlines(vec![
Inline::Text("with bold text".to_owned()),
])),
Inline::Text(".".to_owned())
]))
]);
§API Overview
Function | Input | Output |
---|---|---|
markdown_to_ast() | &str | Vec<Block> |
ast_to_markdown() | &[Block] | String |
ast_to_events() | &[Block] | Vec<Event> |
events_to_ast() | &[Event] | Vec<Block> |
events_to_markdown() | &[Event] | String |
markdown_to_events() | &str | Vec<Event> |
canonicalize() | &str | String |
§Terminology
This crate is able to process and manipulate Markdown in three different representations:
Term | Type | Description |
---|---|---|
Markdown | String | Raw Markdown source / output string |
Events | &[Event] | Markdown parsed by pulldown-cmark into a flat sequence of parser Event s |
AST | Block / &[Block] | Markdown parsed by markdown-ast into a hierarchical structure of Block s |
§Processing Steps
String => Events => Blocks => Events => String
|_____ A ______| |______ C _____|
|______ B _____| |______ D _____|
|__________ E ___________|
|___________ F __________|
|____________________ G _____________________|
- A —
markdown_to_events()
- B —
events_to_ast()
- C —
ast_to_events()
- D —
events_to_markdown()
- E —
markdown_to_ast()
- F —
ast_to_markdown()
- G —
canonicalize()
Note: A wraps pulldown_cmark::Parser
, and D wraps
pulldown_cmark_to_cmark::cmark()
.
§Detailed Examples
§Parse varied Markdown to an AST representation:
use markdown_ast::{
markdown_to_ast, Block, HeadingLevel, Inline, Inlines, ListItem
};
let ast = markdown_to_ast("
# An Example Document
This is a paragraph that
is split across *multiple* lines.
* This is a list item
");
assert_eq!(ast, vec![
Block::Heading(
HeadingLevel::H1,
Inlines(vec![
Inline::Text("An Example Document".to_owned())
])
),
Block::Paragraph(Inlines(vec![
Inline::Text("This is a paragraph that".to_owned()),
Inline::SoftBreak,
Inline::Text("is split across ".to_owned()),
Inline::Emphasis(Inlines(vec![
Inline::Text("multiple".to_owned()),
])),
Inline::Text(" lines.".to_owned()),
])),
Block::List(vec![
ListItem(vec![
Block::Paragraph(Inlines(vec![
Inline::Text("This is a list item".to_owned())
]))
])
])
]);
§Synthesize Markdown using programmatic construction of the document:
Note: This is a more user friendly alternative to a “string builder” approach where the raw Markdown string is constructed piece by piece, which suffers from extra bookkeeping that must be done to manage things like indent level and soft vs hard breaks.
use markdown_ast::{
ast_to_markdown, Block, Inline, Inlines, ListItem,
HeadingLevel,
};
let tech_companies = vec![
("Apple", 1976, 164_000),
("Microsoft", 1975, 221_000),
("Nvidia", 1993, 29_600),
];
let ast = vec![
Block::Heading(HeadingLevel::H1, Inlines::plain_text("Tech Companies")),
Block::plain_text_paragraph("The following are major tech companies:"),
Block::List(Vec::from_iter(
tech_companies
.into_iter()
.map(|(company_name, founded, employee_count)| {
ListItem(vec![
Block::paragraph(vec![Inline::plain_text(company_name)]),
Block::List(vec![
ListItem::plain_text(format!("Founded: {founded}")),
ListItem::plain_text(format!("Employee count: {employee_count}"))
])
])
})
))
];
let markdown: String = ast_to_markdown(&ast);
assert_eq!(markdown, "\
# Tech Companies
The following are major tech companies:
* Apple
* Founded: 1976
* Employee count: 164000
* Microsoft
* Founded: 1975
* Employee count: 221000
* Nvidia
* Founded: 1993
* Employee count: 29600\
");
§Known Issues
Currently markdown-ast
does not escape Markdown content appearing in
leaf inline text:
use markdown_ast::{ast_to_markdown, Block};
let ast = vec![
Block::plain_text_paragraph("In the equation a*b*c ...")
];
let markdown = ast_to_markdown(&ast);
assert_eq!(markdown, "In the equation a*b*c ...");
which will render as:
In the equation abc …
with the asterisks interpreted as emphasis formatting markers, contrary to the intention of the author.
Fixing this robustly will require either:
-
Adding automatic escaping of Markdown characters in
Inline::Text
during rendering (not ideal) -
Adding pre-construction validation checks for
Inline::Text
that prevent constructing anInline
with Markdown formatting characters that have not been escaped correctly by the user.
In either case, fixing this bug will be considered a semver exempt
change in behavior to markdown-ast
.
§Motivation and relation to pulldown-cmark
pulldown-cmark
is a popular
Markdown parser crate. It provides a streaming event (pull parsing) based
representation of a Markdown document. That representation is useful for
efficient transformation of a Markdown document into another format, often
HTML.
However, a streaming parser representation is less amenable to programmatic construction or human-understandable transformations of Markdown documents.
markdown-ast
provides a abstract syntax tree (AST) representation of
Markdown that is easy to construct and work with.
Additionally, pulldown-cmark
is widely used in the Rust crate ecosystem,
for example for mdbook
extensions.
Interoperability with pulldown-cmark
is an intentional design choice for
the usability of markdown-ast
; one could imagine markdown-ast
instead
abstracting over the underlying parser implementation, but my view is that
would limit the utility of markdown-ast
.
Structs§
- Inlines
- A sequence of
Inline
s. (CommonMark: inlines) - List
Item - An item in a list. (CommonMark: list items)
Enums§
- Block
- A piece of structural Markdown content. (CommonMark: blocks, container blocks)
- Code
Block Kind - Heading
Level - Inline
- An inline piece of atomic Markdown content. (CommonMark: inlines)
Functions§
- ast_
to_ events - Convert AST
Block
s into anEvent
sequence. - ast_
to_ markdown - Convert AST
Block
s into a Markdown string. - canonicalize
- Canonicalize (or format) a Markdown input by parsing and then converting back to a string.
- events_
to_ ast - Parse
Event
s into ASTBlock
s. - events_
to_ markdown - Convert
Event
s into a Markdown string. - markdown_
to_ ast - Parse Markdown input string into AST
Block
s. - markdown_
to_ events - Parse Markdown input string into
Event
s.