ferromark
Markdown to HTML at 309 MiB/s. Faster than pulldown-cmark, md4c (C), and comrak. Passes all 652 CommonMark spec tests. Every GFM extension included.
Quick start
let html = to_html;
One function call, no setup. When allocation pressure matters:
let mut buffer = Vecnew;
to_html_into;
// buffer survives across calls — zero repeated allocation
Benchmarks
Numbers, not adjectives. Apple Silicon (M-series), February 2026. All parsers run with GFM tables, strikethrough, and task lists enabled. Output buffers reused where APIs allow. Non-PGO binaries for a fair comparison.
CommonMark 5 KB (wiki-style, mixed content with tables)
| Parser | Throughput | vs ferromark |
|---|---|---|
| ferromark | 289.9 MiB/s | baseline |
| pulldown-cmark | 247.7 MiB/s | 0.85x |
| md4c (C) | 242.3 MiB/s | 0.84x |
| comrak | 73.7 MiB/s | 0.25x |
CommonMark 50 KB (same style, scaled)
| Parser | Throughput | vs ferromark |
|---|---|---|
| ferromark | 309.3 MiB/s | baseline |
| pulldown-cmark | 271.7 MiB/s | 0.88x |
| md4c (C) | 247.4 MiB/s | 0.80x |
| comrak | 76.0 MiB/s | 0.25x |
17% faster than pulldown-cmark. 25% faster than md4c. 4x faster than comrak.
The fixtures are synthetic wiki-style documents with paragraphs, lists, code blocks, and tables. Nothing cherry-picked. Run them yourself: cargo bench --bench comparison
What you get
Full CommonMark: 652/652 spec tests pass. No filtering, no exceptions.
All five GFM extensions: Tables, strikethrough, task lists, autolink literals, disallowed raw HTML.
Beyond GFM: Footnotes, front matter extraction (---/+++), heading IDs (GitHub-compatible slugs), math spans ($/$$), and callouts (> [!NOTE], > [!WARNING], ...).
MDX support (opt-in via mdx feature): Segment and render .mdx files without a JavaScript toolchain. Covers 90%+ of real-world MDX patterns in Next.js, Docusaurus, and Astro.
12 feature flags to turn on exactly what you need:
allow_html · allow_link_refs · tables · strikethrough · task_lists
autolink_literals · disallowed_raw_html · footnotes · front_matter
heading_ids · math · callouts
Trade-offs
ferromark is built for one job: turning Markdown into HTML as fast as possible. That focus means some things it deliberately skips:
- No AST access. You can't walk a syntax tree or write custom renderers against parsed nodes. If you need that, pulldown-cmark's iterator model or comrak's AST are better fits.
- No source maps. No byte-offset tracking for mapping HTML back to Markdown positions.
- HTML only. No XML, no CommonMark round-tripping, no alternative output formats.
These aren't planned. They'd compromise the streaming architecture that makes ferromark fast.
MDX support
MDX is the standard for component-driven docs in Next.js, Docusaurus, and Astro. Processing it usually requires a full JavaScript toolchain — Node.js, acorn, babel, the works.
ferromark takes a different approach: segment .mdx files into typed blocks and render them at native speed. No JS runtime. No AST.
= { = "0.1", = ["mdx"] }
Render — one call, full output
render() assembles the final output automatically: Markdown segments become HTML, JSX and expressions pass through unchanged, ESM and front matter are extracted separately.
use render;
let input = r#"import { Card } from './card'
---
title: Hello
---
# Hello World
<Card title="Example">
Markdown **inside** a component.
</Card>
{new Date().getFullYear()}
"#;
let output = render;
// output.body — HTML with JSX/expressions passed through
// output.esm — vec!["import { Card } from './card'\n"]
// output.front_matter — Some("title: Hello\n")
Use render_with_options() for custom Markdown settings (heading IDs, math, footnotes, etc.).
Component — ready-to-use JSX module
to_component() wraps the output as a complete JSX/TSX module with a named export. Works with React 19, Preact, Solid, and any JSX framework.
let output = render;
let tsx = output.to_component;
import { Card } from './card'
export function HelloWorld() {
return (
<>
<h1 id="hello-world">Hello World</h1>
<Card title="Example">
<p>Markdown <strong>inside</strong> a component.</p>
</Card>
{new Date().getFullYear()}
</>
);
}
Segment — low-level control
When you need full control over each block, use segment() directly:
use ;
for seg in segment
The segmenter handles JSX attribute parsing (strings, expressions, spreads), brace-depth tracking (with string/comment/template-literal awareness), fragment syntax, member expressions (<Foo.Bar>), and multiline tags. Invalid constructs fall back to Markdown — no panics, always valid output.
Full example: cargo run --features mdx --example mdx_segment
The segmenter covers the block-level MDX patterns that make up 90%+ of real-world .mdx files: imports at the top, components wrapping content, expressions between paragraphs. This is what a typical Docusaurus, Next.js, or Astro page looks like — and it works out of the box.
What the segmenter deliberately skips — and why that's fine for most use cases:
| What | Our approach | When it matters |
|---|---|---|
Inline JSX (text <em>here</em>) |
Stays inside Markdown segments | Only if you mix JSX and prose on the same line inside a paragraph — rare in practice |
| JS validation | Heuristic detection (keyword + brace counting) instead of acorn/swc | Only if you need to report syntax errors in user-authored MDX at parse time |
| Markdown grammar | Standard CommonMark/GFM rules | Official mdxjs disables indented code and HTML syntax — relevant if your content relies on <div> being JSX, not HTML |
| Container nesting | > <Component> stays Markdown |
Only if you put JSX inside blockquotes or list items — uncommon |
| TypeScript generics | <Component<T>> not parsed |
Only relevant for TSX-heavy content pages — very rare in docs |
| Error reporting | Silent fallback to Markdown | Means broken JSX renders as text instead of failing — arguably safer for content pipelines |
The full @mdx-js/mdx compiler exists to produce a React component tree from MDX. It needs a JavaScript parser because it compiles to JSX. ferromark's segmenter exists to answer a simpler question: where does the Markdown stop and the JSX start? That question doesn't need a JS runtime.
For the detailed technical spec, see src/mdx/mod.rs.
How it works
No AST. Block events stream from the scanner to the HTML writer with nothing in between.
Input bytes (&[u8])
│
▼
Block parser (line-oriented, memchr-driven)
│ emits BlockEvent stream
▼
Inline parser (mark collection → resolution → emit)
│ emits InlineEvent stream
▼
HTML writer (direct buffer writes)
│
▼
Output (Vec<u8>)
What makes this fast in practice:
- Block scanning runs on
memchrfor line boundaries. Container state is a compact stack, not a tree. - Inline parsing has three phases: collect delimiter marks, resolve precedence (code spans, math, links, emphasis, strikethrough), emit. No backtracking.
- Emphasis resolution uses the CommonMark modulo-3 rule with a delimiter stack instead of expensive rescans.
- SIMD scanning (NEON on ARM) detects special characters in inline content.
- Zero-copy references: events carry
Rangepointers into the input, not copied strings. - Compact events: 24 bytes each, cache-line friendly.
- Hot/cold annotation:
#[inline]on tight loops,#[cold]on error paths, table-driven byte classification.
Design principles
- Linear time. No regex, no backtracking, no quadratic blowup on adversarial input.
- Low allocation pressure. Compact events, range references, reusable output buffers.
- Operational safety. Depth and size limits guard against pathological nesting.
- Small dependency surface. Minimal crates, straightforward integration.
How ferromark compares to the other three top-tier parsers across architecture, features, and output. Ratings use a 4-level heatmap focused on end-to-end Markdown-to-HTML throughput. Scoring is relative per row, so each row has at least one top mark.
Legend: 🟩 strongest 🟨 close behind 🟧 notable tradeoffs 🟥 weakest
Ferromark optimization backlog: docs/arch/ARCH-PLAN-001-performance-opportunities.md
Building
Project structure
src/
├── lib.rs # Public API (to_html, to_html_into, parse, Options)
├── main.rs # CLI binary
├── block/ # Block-level parser
│ ├── parser.rs # Line-oriented block parsing
│ └── event.rs # BlockEvent types
├── inline/ # Inline-level parser
│ ├── mod.rs # Three-phase inline parsing
│ ├── marks.rs # Mark collection + SIMD integration
│ ├── simd.rs # NEON SIMD character scanning
│ ├── event.rs # InlineEvent types
│ ├── code_span.rs
│ ├── emphasis.rs # Modulo-3 stack optimization
│ ├── strikethrough.rs # GFM strikethrough resolution
│ ├── math.rs # Math span resolution ($/$$ delimiters)
│ └── links.rs # Link/image/autolink parsing
├── mdx/ # MDX segmenter + renderer (feature = "mdx")
│ ├── mod.rs # Public API — Segment enum, segment(), render()
│ ├── render.rs # Assembly layer: segments → HTML body + ESM + front matter
│ ├── splitter.rs # Line-based state machine
│ ├── jsx_tag.rs # JSX tag boundary parser
│ └── expr.rs # Expression boundary parser (brace/string/comment tracking)
├── footnote.rs # Footnote store and rendering
├── link_ref.rs # Link reference definitions
├── cursor.rs # Pointer-based byte cursor
├── range.rs # Compact u32 range type
├── render.rs # HTML writer
├── escape.rs # HTML escaping (memchr-optimized)
└── limits.rs # DoS prevention constants
License
MIT -- Copyright 2026 Sebastian Software GmbH, Mainz, Germany