markdown-tui-explorer 1.20.5

# Markdown-to-Text Research

Date: 2026-04-23

This note compares `markdown-reader`'s current markdown-to-text pipeline with a few external projects that solve related terminal-rendering problems, then turns that into concrete suggestions for this repo.

## Scope

I focused on:

- Markdown parsing strategy
- Width-aware text wrapping
- Table layout
- Inline styling preservation
- Source-position handling
- Whether the observed ideas look broadly useful here, instead of only helping one or two current bugs

## Current `markdown-reader` shape

Observed locally:

- Parsing is event-driven with `pulldown-cmark` in [src/markdown/renderer.rs](/Users/leboiko/Documents/temp/temp2/temp3/markdown-reader/src/markdown/renderer.rs).
- Rendering already targets an intermediate block IR: `DocBlock::{Text, Mermaid, Table}` in [src/markdown/mod.rs](/Users/leboiko/Documents/temp/temp2/temp3/markdown-reader/src/markdown/mod.rs).
- Source-line mapping is a first-class concern and is threaded through text, tables, and mermaid blocks.
- Width handling is split across layers:
  - text uses ratatui `Paragraph::wrap`
  - tables are laid out explicitly in [src/ui/table_render.rs](/Users/leboiko/Documents/temp/temp2/temp3/markdown-reader/src/ui/table_render.rs)
  - visual row mapping is reproduced in [src/ui/markdown_view/visual_rows.rs](/Users/leboiko/Documents/temp/temp2/temp3/markdown-reader/src/ui/markdown_view/visual_rows.rs)

Inference:

- The current architecture is already better than a naive "markdown straight to terminal string" pipeline.
- The main risk area is not lack of features. It is duplicated layout logic across parsing, block rendering, ratatui wrapping, and source-line mapping.

## External repos and what matters

### 1. `pulldown-cmark`

Source:

- https://github.com/pulldown-cmark/pulldown-cmark

Observed:

- It explicitly supports event streams with source offsets (`into_offset_iter()`).
- Its design favors a pull-parser pipeline instead of forcing an AST.

Why it matters here:

- Your current renderer is using one of the strongest features of `pulldown-cmark`: offset-aware event parsing.
- That part should probably stay unless you decide you need a richer structural representation than events can comfortably provide.

### 2. `termimad`

Sources:

- https://github.com/Canop/termimad
- inspected locally in `/tmp/markdown-text-survey/termimad`

Observed:

- It has a dedicated wrapping layer (`src/fit/wrap.rs`) that wraps styled composites while preserving list and quote prefixes.
- It treats tables as a separate normalization/layout pass (`src/tbl.rs`):
  - discover table runs
  - measure columns
  - fit widths
  - wrap oversized cells into extra physical rows
  - then apply alignment and borders

Why it matters here:

- This is the clearest example of a reusable algorithmic idea for your repo.
- The important part is not termimad's parser. It is the table pipeline and the "wrap styled content with prefix-aware continuation lines" approach.

### 3. `presenterm`

Source:

- https://github.com/mfontanini/presenterm

Observed:

- It parses markdown into an explicit AST-backed domain model using `comrak` in `src/markdown/parse.rs`.
- It carries source positions through the parse layer.
- It uses `WeightedText` / `WeightedLine` in `src/markdown/text.rs` and `src/render/text.rs` to split styled text by display width while preserving style runs.
- Its table rendering is simple and fixed-width for slide use, which is less directly reusable than its text-splitting model.

Why it matters here:

- The strongest idea here is precomputing width metadata for styled text chunks instead of recomputing widths during multiple later passes.
- The AST-first parser is also relevant if your renderer keeps accumulating state-machine complexity around nested lists, tables, footnotes, and block quotes.

### 4. `glamour` / `glow`

Sources:

- https://github.com/charmbracelet/glamour
- https://github.com/charmbracelet/glow

Observed:

- `glow` delegates markdown rendering to `glamour`.
- `glamour` builds rendering around block elements and style-aware render buffers rather than flattening too early.
- It exposes independent controls for word wrap, table wrap, and inline-table-link behavior.
- The project emphasizes a pure renderer shape: same input + same options => same output.

Why it matters here:

- The useful idea is not the Go implementation itself. It is the separation between:
  - structural rendering
  - style policy
  - terminal/environment policy
- That separation tends to make wrapping and testing easier.

### 5. `mdcat` / `pulldown-cmark-mdcat`

Sources:

- https://github.com/swsnr/mdcat
- https://docs.rs/pulldown-cmark-mdcat/latest/pulldown_cmark_mdcat/
- https://docs.rs/crate/mdcat/2.7.1

Observed:

- `mdcat` split its core renderer into a library (`pulldown-cmark-mdcat`).
- The docs are explicit that table support is limited: inline markup and text wrapping in table cells are not fully supported.

Why it matters here:

- This is a useful warning sign. Even mature terminal markdown renderers often stop short at table-cell wrapping because it is a separate layout problem.
- That matches what I saw elsewhere: broad correctness usually requires a dedicated table engine, not a paragraph wrapper plus borders.

### 6. `pandoc`

Sources:

- https://github.com/jgm/pandoc
- https://pandoc.org/MANUAL.html

Observed:

- Pandoc uses the classic reader -> native AST -> writer architecture.
- Its docs call out width-sensitive plain-text table formatting and cell wrapping behavior.

Why it matters here:

- The main lesson is architectural, not implementation-level: plain-text output gets much easier once the renderer consumes a normalized document model instead of raw parser events.

### 7. `markdown-rs`

Source:

- https://github.com/wooorm/markdown-rs

Observed:

- It exposes a positional AST and is oriented toward more complex markdown transforms than event-stream-only renderers.

Why it matters here:

- It is a plausible alternative if you ever outgrow `pulldown-cmark` events for structural reasons.
- I would not switch just to switch, but it is a credible option if you want AST richness without leaving Rust-native parsing.

## Patterns that show up repeatedly

Across the repos above, the broader patterns are:

1. Strong renderers do not flatten too early.
2. Text wrapping and table layout are treated as different problems.
3. Width is usually computed on styled chunks, not on already-rendered terminal lines.
4. Source positions are easiest to preserve when they are attached before the last rendering pass.
5. Table-cell wrapping is where many implementations either simplify heavily or stop.

## Suggestions for `markdown-reader`

These are ordered by value relative to implementation cost.

### 1. Keep `DocBlock`, but add one more normalization layer inside `Text`

Recommendation:

- Keep the existing top-level IR (`Text`, `Table`, `Mermaid`).
- Inside text rendering, stop relying so heavily on raw `Line<Span>` plus ratatui wrapping as the main source of truth.
- Introduce a small width-aware internal text model for paragraphs/list items/block quotes before final ratatui conversion.

Why:

- Right now visual row math, cursor mapping, highlighting, and rendering all need to agree with ratatui wrapping.
- That creates cross-module coupling and repeated width computation.
- A normalized text model would make wrapping deterministic in your own code and reduce "mirror ratatui exactly" maintenance.

Broadness:

- This is a general renderer-quality improvement, not a mermaid-specific fix.

### 2. Borrow the `WeightedText` idea, not necessarily the whole parser stack

Recommendation:

- Add a local width cache for styled inline chunks:
  - original text
  - style
  - display width
  - optional char/grapheme split points

Why:

- You repeatedly need width for wrapping, cursor mapping, truncation, and visual-row calculations.
- `presenterm`'s approach is a good model for avoiding recomputation and preserving style runs during splits.

Broadness:

- Helps paragraphs, headings, block quotes, lists, code spans, table cells, and search highlighting.

### 3. Upgrade table layout from "truncate when tight" to "row expansion when useful"

Recommendation:

- Keep the current fair-share logic for compact inline rendering.
- Add a second layout mode that can wrap cell content into multiple physical table rows instead of only truncating.

Why:

- This is the most broadly useful table improvement I found in the survey.
- `termimad` shows a practical model:
  - compute column widths
  - split oversized cells
  - materialize extra row fragments
  - apply alignment after splitting

Where to apply it:

- Expanded table view first.
- Inline view later, if desired.

Broadness:

- This is a general markdown-table fix, not a repo-specific optimization.

### 4. Centralize wrapping rules instead of partially delegating to ratatui

Recommendation:

- Choose one source of truth for wrapping behavior.
- Either:
  - fully own wrapping in your renderer, or
  - isolate a tiny compatibility layer that computes visual rows from the exact same splitting algorithm used for drawing

Why:

- The current setup requires [src/ui/markdown_view/visual_rows.rs](/Users/leboiko/Documents/temp/temp2/temp3/markdown-reader/src/ui/markdown_view/visual_rows.rs) to predict what ratatui `Paragraph::wrap` will do.
- That is workable, but fragile when styling, unicode width, or prefix behavior evolves.

Broadness:

- This improves scrolling, link hit-testing, highlights, and edit-mode source mapping across the whole viewer.

### 5. Be conservative about switching parsers

Recommendation:

- Do not replace `pulldown-cmark` immediately.
- Reassess only if one of these becomes frequent:
  - nested structure bugs keep growing
  - parser-state code becomes harder to reason about than a tree walk
  - you want richer block transformations before render

Why:

- Your current use of `pulldown-cmark` offsets is a real advantage.
- The migration cost to `comrak` or `markdown-rs` is justified only if structural complexity, not just layout complexity, becomes the main bottleneck.

### 6. Add width-regression tests the same way termimad does

Recommendation:

- Add parameterized tests that render the same markdown at many widths.
- Assert invariants like:
  - no overflow
  - stable source-line mapping
  - list continuation alignment
  - table borders remain valid
  - no style-span loss after wrapping/truncation

Why:

- The best surveyed projects test across width ranges, not only one golden width.
- Width bugs are usually algorithm bugs, so this catches broad regressions early.

## Concrete adoption plan

### Phase 1: Low-risk improvements

- Introduce a styled width-cache type for inline chunks.
- Use it in visual-row counting and text truncation.
- Add width-sweep tests for paragraphs, lists, block quotes, and tables.

### Phase 2: Better table engine

- Add a wrapped-cell table layout path for expanded tables.
- Reuse existing `CellSpans` and alignments.
- Materialize physical row fragments explicitly instead of asking ratatui to wrap table text.

### Phase 3: Text normalization

- Introduce an internal wrapped text representation for non-table markdown blocks.
- Convert that to ratatui `Text` only at the end.

### Phase 4: Revisit parser choice only if needed

- Prototype `comrak` or `markdown-rs` only if renderer state complexity keeps growing despite the layout cleanup.

## What I would not do

- I would not replace the whole renderer with another project.
- I would not switch parsers just because ASTs look cleaner on paper.
- I would not optimize around mermaid-specific cases when the underlying problem is generic label/text layout.
- I would not push wrapped tables into the main inline path first; expanded view is the safer proving ground.

## Bottom line

The biggest reusable ideas are:

1. A width-cached styled text model for splitting and mapping.
2. A dedicated table layout pass that can wrap cells into extra physical rows.
3. Less dependence on ratatui wrapping as hidden renderer behavior.

If I had to pick one direction with the best payoff, it would be:

- keep `pulldown-cmark`
- keep `DocBlock`
- add a stronger local layout model for text and tables

That is the most broadly useful path I found, and it aligns with how the stronger terminal markdown renderers are structured.