md-formatter 0.3.0

A fast, opinionated Markdown formatter
Documentation
# Fast, Opinionated Markdown Formatter — Project Plan (Rust)

## 1. Project Overview

Build a **fast**, **opinionated**, **CommonMark/GFM‑focused** Markdown  
formatter written in **Rust**, designed as a clean alternative to Prettier for  
Markdown-only workflows.  
This formatter intentionally omits MDX, embedded language formatting, and  
semantic width analysis, focusing on speed and predictable output.

## 2. High-Level Goals

- Extremely fast formatting (Rust-native performance)
- Deterministic, idempotent output
- Opinionated formatting decisions (no configuration except width)
- Safe round-tripping (no semantic changes)
- Integrates cleanly into Biome-style lint/format pipelines

## 3. Out-of-Scope (Explicitly)

- MDX (JSX/TSX in markdown)
- Formatting embedded code blocks
- Rewriting HTML inside markdown
- Advanced semantic wrapping heuristics

## 4. Core Architecture

### 4.1 Pipeline Overview

```
Raw Markdown
   → Tokenizer (pulldown-cmark)
   → Parsed Events / AST
   → Normalization Pass (custom)
   → Pretty-Printer / Renderer
   → Final Markdown Output
```

### 4.2 Parsing Layer

Use:

- `pulldown-cmark` for parsing
- `pulldown-cmark-to-cmark` OR a custom serializer baseline

Why:

- Fast
- Mature
- Handles GFM reasonably well
- Good balance between lossless-enough events and performance

### 4.3 Internal Representation Strategy

Use one of these two approaches (your implementer can choose):

**A. Event Stream Normalization (Recommended for MVP)**

- 

## Consume `pulldown_cmark::Event` stream

Track context using a small stack:

- List depth
- Blockquote depth
- Code block state
- Paragraph state
- 

Output normalized markdown progressively

This avoids building a full AST and keeps performance maximized.

**B. Build a Full AST**

- More control
- Slower MVP
- Higher complexity

Recommend **A** for initial version.

---

## 5. Formatting Rules (Opinionated Spec)

### 5.1 Paragraphs

- Reflow text into lines of fixed max width (default 80)
- Preserve hard line breaks (`two spaces + newline`)
- Preserve HTML blocks entirely (no formatting inside; treat as opaque)

### 5.2 Headings

- 

Normalize ATX headings:

- `# Heading`
- No trailing `#`
- Ensure a single space after the hash group

### 5.3 Lists

- 

## Normalize ordered lists to always use `1.` style

## Normalize unordered lists to `-` (not `*` or `+`)

Indentation:

- 2 spaces per nesting level
- 

Ensure blank line before and after lists unless tight list rules apply

### 5.4 Blockquotes

- Each line begins with correct number of `>` prefixes
- Reflow text *inside* blockquotes (preserve code blocks)
- Blank lines inside blockquotes get the same prefix rules

### 5.5 Code Blocks

- Fenced only (`````)
- Preserve language tag if present
- Do NOT format internal code
- Trim trailing whitespace inside the block but preserve all user content

### 5.6 Inline Elements

- 

## Preserve original emphasis markers where possible (`*italic*` kept as-is)

Normalize link reference format:

- `[text](url)`
- 

Autolinks remain `<http://example.com>`

### 5.7 Horizontal Rules

- Normalize to:

```
---
```

### 5.8 Frontmatter

- If file begins with:

```
---
key: value
---
```

then preserve block exactly without changes.

---

## 6. Pretty Printer Design

### 6.1 Writing Strategy

Use a `Formatter` struct:

```
struct Formatter {
    output: String,
    indent_level: usize,
    line_width: usize,
}
```

Provides helpers:

- `write_line`
- `write_wrapped`
- `write_indent`
- `push_list_marker`
- `push_blockquote_prefix`

### 6.2 Word Wrapping Algorithm

Use a simple greedy wrapper:

```
for each token:
    if adding token exceeds line_width:
        line break
    else:
        append token
```

Special rules:

- Blockquote prefixes are considered part of line width.
- Leading indentation for list items considered part of width.

### 6.3 Handling Context

Maintain a stack:

- `Context::Paragraph`
- `Context::List(Ordered|Unordered, depth)`
- `Context::Blockquote(depth)`
- `Context::CodeBlock(fence_language)`
- `Context::HtmlBlock`

Push/pop as events are consumed.

---

## 7. CLI Design

### 7.1 CLI Requirements

Command:

```
mdfmt [options] <path>
```

Options:

- `--write` (in-place)
- `--check` (verify formatted)
- `--stdin`
- `--width <number>`
- `--version`
- `--help`

Exit codes:

- `0` = formatted or unchanged
- `1` = would change (in check mode)
- `2` = error

### 7.2 Ability to act as formatter for Biome

Support stdin/stdout with deterministic output so Biome can shell out to it.

---

## 8. Testing Strategy

### 8.1 Snapshot Tests

Use `insta` for text snapshots:

- Paragraph wrapping cases
- Nested lists
- Blockquotes
- Mixed inline text
- Heavy edge-case markdown

### 8.2 Round Trip Safety Tests

Ensure:

```
format(format(input)) == format(input)
```

### 8.3 GFM Edge Tests

- Tables (preserve table structure but do not format)
- Autolinks
- Strikethrough

---

## 9. Performance Targets

- Must format a 100 KB markdown file < **5 ms**
- Must format a 1 MB markdown file < **50–100 ms**
- Memory footprint < **5 MB**

Benchmark against:

- Prettier
- mdformat
- dprint

---

## 10. Future Roadmap (Post-MVP)

- Optional MDX support
- Format tables (alignment)
- Format inline HTML
- Embedded code block formatting via plugins
- Config system
- Biome-internal integration with WASM

---

## 11. Deliverables Summary

- Rust crate (`mdfmt`)
- CLI binary
- Core formatter module
- Unit tests + snapshot tests
- Benchmarks

This is a fully scoped plan another LLM can implement directly.