ferromark

Markdown to HTML at 309 MiB/s. Faster than pulldown-cmark, md4c (C), and comrak. Passes all 652 CommonMark spec tests. Every GFM extension included.

Quick start

let html = ferromark::to_html("# Hello\n\n**World**");

One function call, no setup. When allocation pressure matters:

let mut buffer = Vec::new();
ferromark::to_html_into("# Reuse me", &mut buffer);
// buffer survives across calls — zero repeated allocation

Benchmarks

Numbers, not adjectives. Apple Silicon (M-series), February 2026. All parsers run with GFM tables, strikethrough, and task lists enabled. Output buffers reused where APIs allow. Non-PGO binaries for a fair comparison.

CommonMark 5 KB (wiki-style, mixed content with tables)

Parser	Throughput	vs ferromark
ferromark	289.9 MiB/s	baseline
pulldown-cmark	247.7 MiB/s	0.85x
md4c (C)	242.3 MiB/s	0.84x
comrak	73.7 MiB/s	0.25x

CommonMark 50 KB (same style, scaled)

Parser	Throughput	vs ferromark
ferromark	309.3 MiB/s	baseline
pulldown-cmark	271.7 MiB/s	0.88x
md4c (C)	247.4 MiB/s	0.80x
comrak	76.0 MiB/s	0.25x

17% faster than pulldown-cmark. 25% faster than md4c. 4x faster than comrak.

The fixtures are synthetic wiki-style documents with paragraphs, lists, code blocks, and tables. Nothing cherry-picked. Run them yourself: cargo bench --bench comparison

What you get

Full CommonMark: 652/652 spec tests pass. No filtering, no exceptions.

All five GFM extensions: Tables, strikethrough, task lists, autolink literals, disallowed raw HTML.

Beyond GFM: Footnotes, front matter extraction (---/+++), heading IDs (GitHub-compatible slugs), math spans ($/$$), and callouts (> [!NOTE], > [!WARNING], ...).

MDX support (opt-in via mdx feature): Segment and render .mdx files without a JavaScript toolchain. Covers 90%+ of real-world MDX patterns in Next.js, Docusaurus, and Astro.

12 feature flags to turn on exactly what you need:

allow_html · allow_link_refs · tables · strikethrough · task_lists
autolink_literals · disallowed_raw_html · footnotes · front_matter
heading_ids · math · callouts

Trade-offs

ferromark is built for one job: turning Markdown into HTML as fast as possible. That focus means some things it deliberately skips:

No AST access. You can't walk a syntax tree or write custom renderers against parsed nodes. If you need that, pulldown-cmark's iterator model or comrak's AST are better fits.
No source maps. No byte-offset tracking for mapping HTML back to Markdown positions.
HTML only. No XML, no CommonMark round-tripping, no alternative output formats.

These aren't planned. They'd compromise the streaming architecture that makes ferromark fast.

MDX support

MDX is the standard for component-driven docs in Next.js, Docusaurus, and Astro. Processing it usually requires a full JavaScript toolchain — Node.js, acorn, babel, the works.

ferromark takes a different approach: segment .mdx files into typed blocks and render them at native speed. No JS runtime. No AST.

ferromark = { version = "0.1", features = ["mdx"] }

Render — one call, full output

render() assembles the final output automatically: Markdown segments become HTML, JSX and expressions pass through unchanged, ESM and front matter are extracted separately.

use ferromark::mdx::render;

let input = r#"import { Card } from './card'

---
title: Hello
---

# Hello World

<Card title="Example">

Markdown **inside** a component.

</Card>

{new Date().getFullYear()}
"#;

let output = render(input);
// output.body        — HTML with JSX/expressions passed through
// output.esm         — vec!["import { Card } from './card'\n"]
// output.front_matter — Some("title: Hello\n")

Use render_with_options() for custom Markdown settings (heading IDs, math, footnotes, etc.).

Component — ready-to-use JSX module

to_component() wraps the output as a complete JSX/TSX module with a named export. Works with React 19, Preact, Solid, and any JSX framework.

let output = render(input);
let tsx = output.to_component("HelloWorld");

import { Card } from './card'

export function HelloWorld() {
  return (
    <>
      <h1 id="hello-world">Hello World</h1>
      <Card title="Example">
        <p>Markdown <strong>inside</strong> a component.</p>
      </Card>
      {new Date().getFullYear()}
    </>
  );
}

Segment — low-level control

When you need full control over each block, use segment() directly:

use ferromark::mdx::{segment, Segment};

for seg in segment(input) {
    match seg {
        Segment::Esm(s)              => { /* import/export — pass through */ }
        Segment::Markdown(s)         => { /* parse with ferromark::to_html(s) */ }
        Segment::JsxBlockOpen(s)     => { /* <Component> */ }
        Segment::JsxBlockClose(s)    => { /* </Component> */ }
        Segment::JsxBlockSelfClose(s)=> { /* <Component /> */ }
        Segment::Expression(s)       => { /* {expression} */ }
    }
}

The segmenter handles JSX attribute parsing (strings, expressions, spreads), brace-depth tracking (with string/comment/template-literal awareness), fragment syntax, member expressions (<Foo.Bar>), and multiline tags. Invalid constructs fall back to Markdown — no panics, always valid output.

Full example: cargo run --features mdx --example mdx_segment

The segmenter covers the block-level MDX patterns that make up 90%+ of real-world .mdx files: imports at the top, components wrapping content, expressions between paragraphs. This is what a typical Docusaurus, Next.js, or Astro page looks like — and it works out of the box.

What the segmenter deliberately skips — and why that's fine for most use cases:

What	Our approach	When it matters
Inline JSX (`text <em>here</em>`)	Stays inside Markdown segments	Only if you mix JSX and prose on the same line inside a paragraph — rare in practice
JS validation	Heuristic detection (keyword + brace counting) instead of acorn/swc	Only if you need to report syntax errors in user-authored MDX at parse time
Markdown grammar	Standard CommonMark/GFM rules	Official mdxjs disables indented code and HTML syntax — relevant if your content relies on `<div>` being JSX, not HTML
Container nesting	`> <Component>` stays Markdown	Only if you put JSX inside blockquotes or list items — uncommon
TypeScript generics	`<Component<T>>` not parsed	Only relevant for TSX-heavy content pages — very rare in docs
Error reporting	Silent fallback to Markdown	Means broken JSX renders as text instead of failing — arguably safer for content pipelines

The full @mdx-js/mdx compiler exists to produce a React component tree from MDX. It needs a JavaScript parser because it compiles to JSX. ferromark's segmenter exists to answer a simpler question: where does the Markdown stop and the JSX start? That question doesn't need a JS runtime.

For the detailed technical spec, see src/mdx/mod.rs.

How it works

No AST. Block events stream from the scanner to the HTML writer with nothing in between.

Input bytes (&[u8])
       │
       ▼
   Block parser (line-oriented, memchr-driven)
       │ emits BlockEvent stream
       ▼
   Inline parser (mark collection → resolution → emit)
       │ emits InlineEvent stream
       ▼
   HTML writer (direct buffer writes)
       │
       ▼
   Output (Vec<u8>)

What makes this fast in practice:

Block scanning runs on memchr for line boundaries. Container state is a compact stack, not a tree.
Inline parsing has three phases: collect delimiter marks, resolve precedence (code spans, math, links, emphasis, strikethrough), emit. No backtracking.
Emphasis resolution uses the CommonMark modulo-3 rule with a delimiter stack instead of expensive rescans.
SIMD scanning (NEON on ARM) detects special characters in inline content.
Zero-copy references: events carry Range pointers into the input, not copied strings.
Compact events: 24 bytes each, cache-line friendly.
Hot/cold annotation: #[inline] on tight loops, #[cold] on error paths, table-driven byte classification.

Design principles

Linear time. No regex, no backtracking, no quadratic blowup on adversarial input.
Low allocation pressure. Compact events, range references, reusable output buffers.
Operational safety. Depth and size limits guard against pathological nesting.
Small dependency surface. Minimal crates, straightforward integration.

How ferromark compares to the other three top-tier parsers across architecture, features, and output. Ratings use a 4-level heatmap focused on end-to-end Markdown-to-HTML throughput. Scoring is relative per row, so each row has at least one top mark.

Legend: 🟩 strongest 🟨 close behind 🟧 notable tradeoffs 🟥 weakest

Ferromark optimization backlog: docs/arch/ARCH-PLAN-001-performance-opportunities.md

Building

cargo build            # development
cargo build --release  # optimized (recommended for benchmarks)
cargo test             # run tests
cargo test --test commonmark_spec -- --nocapture  # CommonMark spec
cargo bench            # benchmarks

Project structure

src/
├── lib.rs          # Public API (to_html, to_html_into, parse, Options)
├── main.rs         # CLI binary
├── block/          # Block-level parser
│   ├── parser.rs   # Line-oriented block parsing
│   └── event.rs    # BlockEvent types
├── inline/         # Inline-level parser
│   ├── mod.rs      # Three-phase inline parsing
│   ├── marks.rs    # Mark collection + SIMD integration
│   ├── simd.rs     # NEON SIMD character scanning
│   ├── event.rs    # InlineEvent types
│   ├── code_span.rs
│   ├── emphasis.rs      # Modulo-3 stack optimization
│   ├── strikethrough.rs # GFM strikethrough resolution
│   ├── math.rs          # Math span resolution ($/$$ delimiters)
│   └── links.rs         # Link/image/autolink parsing
├── mdx/            # MDX segmenter + renderer (feature = "mdx")
│   ├── mod.rs      # Public API — Segment enum, segment(), render()
│   ├── render.rs   # Assembly layer: segments → HTML body + ESM + front matter
│   ├── splitter.rs # Line-based state machine
│   ├── jsx_tag.rs  # JSX tag boundary parser
│   └── expr.rs     # Expression boundary parser (brace/string/comment tracking)
├── footnote.rs     # Footnote store and rendering
├── link_ref.rs     # Link reference definitions
├── cursor.rs       # Pointer-based byte cursor
├── range.rs        # Compact u32 range type
├── render.rs       # HTML writer
├── escape.rs       # HTML escaping (memchr-optimized)
└── limits.rs       # DoS prevention constants

ferromark 0.1.3