ferromark

Markdown to HTML at 309 MiB/s. Faster than pulldown-cmark, md4c (C), and comrak. Passes all 652 CommonMark spec tests. Every GFM extension included.

Quick start

let html = ferromark::to_html("# Hello\n\n**World**");

One function call, no setup. When allocation pressure matters:

let mut buffer = Vec::new();
ferromark::to_html_into("# Reuse me", &mut buffer);
// buffer survives across calls — zero repeated allocation

Benchmarks

Numbers, not adjectives. Apple Silicon (M-series), February 2026. All parsers run with GFM tables, strikethrough, and task lists enabled. Output buffers reused where APIs allow. Non-PGO binaries for a fair comparison.

CommonMark 5 KB (wiki-style, mixed content with tables)

Parser	Throughput	vs ferromark
ferromark	289.9 MiB/s	baseline
pulldown-cmark	247.7 MiB/s	0.85x
md4c (C)	242.3 MiB/s	0.84x
comrak	73.7 MiB/s	0.25x

CommonMark 50 KB (same style, scaled)

Parser	Throughput	vs ferromark
ferromark	309.3 MiB/s	baseline
pulldown-cmark	271.7 MiB/s	0.88x
md4c (C)	247.4 MiB/s	0.80x
comrak	76.0 MiB/s	0.25x

17% faster than pulldown-cmark. 25% faster than md4c. 4x faster than comrak.

The fixtures are synthetic wiki-style documents with paragraphs, lists, code blocks, and tables. Nothing cherry-picked. Run them yourself: cargo bench --bench comparison

What you get

Full CommonMark: 652/652 spec tests pass. No filtering, no exceptions.

All five GFM extensions: Tables, strikethrough, task lists, autolink literals, disallowed raw HTML.

Beyond GFM: Footnotes, front matter extraction (---/+++), heading IDs (GitHub-compatible slugs), math spans ($/$$), and callouts (> [!NOTE], > [!WARNING], ...).

12 feature flags to turn on exactly what you need:

allow_html · allow_link_refs · tables · strikethrough · task_lists
autolink_literals · disallowed_raw_html · footnotes · front_matter
heading_ids · math · callouts

Trade-offs

ferromark is built for one job: turning Markdown into HTML as fast as possible. That focus means some things it deliberately skips:

No AST access. You can't walk a syntax tree or write custom renderers against parsed nodes. If you need that, pulldown-cmark's iterator model or comrak's AST are better fits.
No source maps. No byte-offset tracking for mapping HTML back to Markdown positions.
HTML only. No XML, no CommonMark round-tripping, no alternative output formats.

These aren't planned. They'd compromise the streaming architecture that makes ferromark fast.

How it works

No AST. Block events stream from the scanner to the HTML writer with nothing in between.

Input bytes (&[u8])
       │
       ▼
   Block parser (line-oriented, memchr-driven)
       │ emits BlockEvent stream
       ▼
   Inline parser (mark collection → resolution → emit)
       │ emits InlineEvent stream
       ▼
   HTML writer (direct buffer writes)
       │
       ▼
   Output (Vec<u8>)

What makes this fast in practice:

Block scanning runs on memchr for line boundaries. Container state is a compact stack, not a tree.
Inline parsing has three phases: collect delimiter marks, resolve precedence (code spans, math, links, emphasis, strikethrough), emit. No backtracking.
Emphasis resolution uses the CommonMark modulo-3 rule with a delimiter stack instead of expensive rescans.
SIMD scanning (NEON on ARM) detects special characters in inline content.
Zero-copy references: events carry Range pointers into the input, not copied strings.
Compact events: 24 bytes each, cache-line friendly.
Hot/cold annotation: #[inline] on tight loops, #[cold] on error paths, table-driven byte classification.

Design principles

Linear time. No regex, no backtracking, no quadratic blowup on adversarial input.
Low allocation pressure. Compact events, range references, reusable output buffers.
Operational safety. Depth and size limits guard against pathological nesting.
Small dependency surface. Minimal crates, straightforward integration.

How ferromark compares to the other three top-tier parsers across architecture, features, and output. Ratings use a 4-level heatmap focused on end-to-end Markdown-to-HTML throughput. Scoring is relative per row, so each row has at least one top mark.

Legend: 🟩 strongest 🟨 close behind 🟧 notable tradeoffs 🟥 weakest

Ferromark optimization backlog: docs/arch/ARCH-PLAN-001-performance-opportunities.md

Building

cargo build            # development
cargo build --release  # optimized (recommended for benchmarks)
cargo test             # run tests
cargo test --test commonmark_spec -- --nocapture  # CommonMark spec
cargo bench            # benchmarks

Project structure

src/
├── lib.rs          # Public API (to_html, to_html_into, parse, Options)
├── main.rs         # CLI binary
├── block/          # Block-level parser
│   ├── parser.rs   # Line-oriented block parsing
│   └── event.rs    # BlockEvent types
├── inline/         # Inline-level parser
│   ├── mod.rs      # Three-phase inline parsing
│   ├── marks.rs    # Mark collection + SIMD integration
│   ├── simd.rs     # NEON SIMD character scanning
│   ├── event.rs    # InlineEvent types
│   ├── code_span.rs
│   ├── emphasis.rs      # Modulo-3 stack optimization
│   ├── strikethrough.rs # GFM strikethrough resolution
│   ├── math.rs          # Math span resolution ($/$$ delimiters)
│   └── links.rs         # Link/image/autolink parsing
├── footnote.rs     # Footnote store and rendering
├── link_ref.rs     # Link reference definitions
├── cursor.rs       # Pointer-based byte cursor
├── range.rs        # Compact u32 range type
├── render.rs       # HTML writer
├── escape.rs       # HTML escaping (memchr-optimized)
└── limits.rs       # DoS prevention constants

License

MIT OR Apache-2.0

ferromark 0.1.1