ferromark 0.1.3

Ultra-high-performance Markdown to HTML compiler
Documentation

ferromark

CI crates.io docs.rs License: MIT Rust 1.85+ clippy

Markdown to HTML at 309 MiB/s. Faster than pulldown-cmark, md4c (C), and comrak. Passes all 652 CommonMark spec tests. Every GFM extension included.

Quick start

let html = ferromark::to_html("# Hello\n\n**World**");

One function call, no setup. When allocation pressure matters:

let mut buffer = Vec::new();
ferromark::to_html_into("# Reuse me", &mut buffer);
// buffer survives across calls — zero repeated allocation

Benchmarks

Numbers, not adjectives. Apple Silicon (M-series), February 2026. All parsers run with GFM tables, strikethrough, and task lists enabled. Output buffers reused where APIs allow. Non-PGO binaries for a fair comparison.

CommonMark 5 KB (wiki-style, mixed content with tables)

Parser Throughput vs ferromark
ferromark 289.9 MiB/s baseline
pulldown-cmark 247.7 MiB/s 0.85x
md4c (C) 242.3 MiB/s 0.84x
comrak 73.7 MiB/s 0.25x

CommonMark 50 KB (same style, scaled)

Parser Throughput vs ferromark
ferromark 309.3 MiB/s baseline
pulldown-cmark 271.7 MiB/s 0.88x
md4c (C) 247.4 MiB/s 0.80x
comrak 76.0 MiB/s 0.25x

17% faster than pulldown-cmark. 25% faster than md4c. 4x faster than comrak.

The fixtures are synthetic wiki-style documents with paragraphs, lists, code blocks, and tables. Nothing cherry-picked. Run them yourself: cargo bench --bench comparison

What you get

Full CommonMark: 652/652 spec tests pass. No filtering, no exceptions.

All five GFM extensions: Tables, strikethrough, task lists, autolink literals, disallowed raw HTML.

Beyond GFM: Footnotes, front matter extraction (---/+++), heading IDs (GitHub-compatible slugs), math spans ($/$$), and callouts (> [!NOTE], > [!WARNING], ...).

MDX support (opt-in via mdx feature): Segment and render .mdx files without a JavaScript toolchain. Covers 90%+ of real-world MDX patterns in Next.js, Docusaurus, and Astro.

12 feature flags to turn on exactly what you need:

allow_html · allow_link_refs · tables · strikethrough · task_lists
autolink_literals · disallowed_raw_html · footnotes · front_matter
heading_ids · math · callouts

Trade-offs

ferromark is built for one job: turning Markdown into HTML as fast as possible. That focus means some things it deliberately skips:

  • No AST access. You can't walk a syntax tree or write custom renderers against parsed nodes. If you need that, pulldown-cmark's iterator model or comrak's AST are better fits.
  • No source maps. No byte-offset tracking for mapping HTML back to Markdown positions.
  • HTML only. No XML, no CommonMark round-tripping, no alternative output formats.

These aren't planned. They'd compromise the streaming architecture that makes ferromark fast.

MDX support

MDX is the standard for component-driven docs in Next.js, Docusaurus, and Astro. Processing it usually requires a full JavaScript toolchain — Node.js, acorn, babel, the works.

ferromark takes a different approach: segment .mdx files into typed blocks and render them at native speed. No JS runtime. No AST.

ferromark = { version = "0.1", features = ["mdx"] }

Render — one call, full output

render() assembles the final output automatically: Markdown segments become HTML, JSX and expressions pass through unchanged, ESM and front matter are extracted separately.

use ferromark::mdx::render;

let input = r#"import { Card } from './card'

---
title: Hello
---

# Hello World

<Card title="Example">

Markdown **inside** a component.

</Card>

{new Date().getFullYear()}
"#;

let output = render(input);
// output.body        — HTML with JSX/expressions passed through
// output.esm         — vec!["import { Card } from './card'\n"]
// output.front_matter — Some("title: Hello\n")

Use render_with_options() for custom Markdown settings (heading IDs, math, footnotes, etc.).

Component — ready-to-use JSX module

to_component() wraps the output as a complete JSX/TSX module with a named export. Works with React 19, Preact, Solid, and any JSX framework.

let output = render(input);
let tsx = output.to_component("HelloWorld");
import { Card } from './card'

export function HelloWorld() {
  return (
    <>
      <h1 id="hello-world">Hello World</h1>
      <Card title="Example">
        <p>Markdown <strong>inside</strong> a component.</p>
      </Card>
      {new Date().getFullYear()}
    </>
  );
}

Segment — low-level control

When you need full control over each block, use segment() directly:

use ferromark::mdx::{segment, Segment};

for seg in segment(input) {
    match seg {
        Segment::Esm(s)              => { /* import/export — pass through */ }
        Segment::Markdown(s)         => { /* parse with ferromark::to_html(s) */ }
        Segment::JsxBlockOpen(s)     => { /* <Component> */ }
        Segment::JsxBlockClose(s)    => { /* </Component> */ }
        Segment::JsxBlockSelfClose(s)=> { /* <Component /> */ }
        Segment::Expression(s)       => { /* {expression} */ }
    }
}

The segmenter handles JSX attribute parsing (strings, expressions, spreads), brace-depth tracking (with string/comment/template-literal awareness), fragment syntax, member expressions (<Foo.Bar>), and multiline tags. Invalid constructs fall back to Markdown — no panics, always valid output.

Full example: cargo run --features mdx --example mdx_segment

The segmenter covers the block-level MDX patterns that make up 90%+ of real-world .mdx files: imports at the top, components wrapping content, expressions between paragraphs. This is what a typical Docusaurus, Next.js, or Astro page looks like — and it works out of the box.

What the segmenter deliberately skips — and why that's fine for most use cases:

What Our approach When it matters
Inline JSX (text <em>here</em>) Stays inside Markdown segments Only if you mix JSX and prose on the same line inside a paragraph — rare in practice
JS validation Heuristic detection (keyword + brace counting) instead of acorn/swc Only if you need to report syntax errors in user-authored MDX at parse time
Markdown grammar Standard CommonMark/GFM rules Official mdxjs disables indented code and HTML syntax — relevant if your content relies on <div> being JSX, not HTML
Container nesting > <Component> stays Markdown Only if you put JSX inside blockquotes or list items — uncommon
TypeScript generics <Component<T>> not parsed Only relevant for TSX-heavy content pages — very rare in docs
Error reporting Silent fallback to Markdown Means broken JSX renders as text instead of failing — arguably safer for content pipelines

The full @mdx-js/mdx compiler exists to produce a React component tree from MDX. It needs a JavaScript parser because it compiles to JSX. ferromark's segmenter exists to answer a simpler question: where does the Markdown stop and the JSX start? That question doesn't need a JS runtime.

For the detailed technical spec, see src/mdx/mod.rs.

How it works

No AST. Block events stream from the scanner to the HTML writer with nothing in between.

Input bytes (&[u8])
       │
       ▼
   Block parser (line-oriented, memchr-driven)
       │ emits BlockEvent stream
       ▼
   Inline parser (mark collection → resolution → emit)
       │ emits InlineEvent stream
       ▼
   HTML writer (direct buffer writes)
       │
       ▼
   Output (Vec<u8>)

What makes this fast in practice:

  • Block scanning runs on memchr for line boundaries. Container state is a compact stack, not a tree.
  • Inline parsing has three phases: collect delimiter marks, resolve precedence (code spans, math, links, emphasis, strikethrough), emit. No backtracking.
  • Emphasis resolution uses the CommonMark modulo-3 rule with a delimiter stack instead of expensive rescans.
  • SIMD scanning (NEON on ARM) detects special characters in inline content.
  • Zero-copy references: events carry Range pointers into the input, not copied strings.
  • Compact events: 24 bytes each, cache-line friendly.
  • Hot/cold annotation: #[inline] on tight loops, #[cold] on error paths, table-driven byte classification.

Design principles

  • Linear time. No regex, no backtracking, no quadratic blowup on adversarial input.
  • Low allocation pressure. Compact events, range references, reusable output buffers.
  • Operational safety. Depth and size limits guard against pathological nesting.
  • Small dependency surface. Minimal crates, straightforward integration.

How ferromark compares to the other three top-tier parsers across architecture, features, and output. Ratings use a 4-level heatmap focused on end-to-end Markdown-to-HTML throughput. Scoring is relative per row, so each row has at least one top mark.

Legend: 🟩 strongest   🟨 close behind   🟧 notable tradeoffs   🟥 weakest

Ferromark optimization backlog: docs/arch/ARCH-PLAN-001-performance-opportunities.md

Building

cargo build            # development
cargo build --release  # optimized (recommended for benchmarks)
cargo test             # run tests
cargo test --test commonmark_spec -- --nocapture  # CommonMark spec
cargo bench            # benchmarks

Project structure

src/
├── lib.rs          # Public API (to_html, to_html_into, parse, Options)
├── main.rs         # CLI binary
├── block/          # Block-level parser
│   ├── parser.rs   # Line-oriented block parsing
│   └── event.rs    # BlockEvent types
├── inline/         # Inline-level parser
│   ├── mod.rs      # Three-phase inline parsing
│   ├── marks.rs    # Mark collection + SIMD integration
│   ├── simd.rs     # NEON SIMD character scanning
│   ├── event.rs    # InlineEvent types
│   ├── code_span.rs
│   ├── emphasis.rs      # Modulo-3 stack optimization
│   ├── strikethrough.rs # GFM strikethrough resolution
│   ├── math.rs          # Math span resolution ($/$$ delimiters)
│   └── links.rs         # Link/image/autolink parsing
├── mdx/            # MDX segmenter + renderer (feature = "mdx")
│   ├── mod.rs      # Public API — Segment enum, segment(), render()
│   ├── render.rs   # Assembly layer: segments → HTML body + ESM + front matter
│   ├── splitter.rs # Line-based state machine
│   ├── jsx_tag.rs  # JSX tag boundary parser
│   └── expr.rs     # Expression boundary parser (brace/string/comment tracking)
├── footnote.rs     # Footnote store and rendering
├── link_ref.rs     # Link reference definitions
├── cursor.rs       # Pointer-based byte cursor
├── range.rs        # Compact u32 range type
├── render.rs       # HTML writer
├── escape.rs       # HTML escaping (memchr-optimized)
└── limits.rs       # DoS prevention constants

License

MIT -- Copyright 2026 Sebastian Software GmbH, Mainz, Germany