# Design: `zim pdf` — Markdown Tree → PDF
## Intent
Add a `zim pdf` subcommand that walks the current directory, gathers every `.md`
file, and produces a single nicely-typeset PDF with a table of contents.
Primary use case: turning a zim audio project (sidecar `.md` files plus any
`README.md` / liner-notes) into a printable artifact. Secondary: any tree of
markdown docs.
The whole pipeline runs inside one `zim` invocation. The user never executes a
separate prep script — text munging, LaTeX assembly, and the LaTeX call all
happen behind one command.
## Constraints
- **One command, no helper scripts.** All markdown collection, frontmatter
stripping, and LaTeX assembly is done in Rust inside the `pdf` handler. Only
external call is the LaTeX engine itself.
- **Page breaks at directory boundaries only.** Files inside the same directory
flow continuously. Each new directory starts on a fresh page.
- **Stable, deterministic ordering** so re-runs produce diff-friendly output:
directories sorted alphabetically (depth-first), files within a directory
sorted alphabetically, with `README.md` (if present) hoisted to the top of
its directory.
- **Assumed installed:** MacTeX (or any TeX Live) providing `xelatex`. If not
found on `PATH`, fail with a clear "install MacTeX" message and exit non-zero.
- **Out of scope (for v1):** custom themes, cover-page images, watermarking,
windows/linux-specific install hints, embedding audio waveform renders,
parallel directory processing.
## Approach
Pure-Rust pipeline: parse markdown with `pulldown-cmark`, emit LaTeX from a
small visitor, then shell out to `xelatex` once for compilation. Only external
dependency for the user is MacTeX. (Pandoc was considered and rejected — it
would add a second install.)
### Hierarchy mapping
The user's mental model drives the structure:
- **Directory → `\section`.** Named after the directory's `README.md` H1 if
present, else the directory name. The README's *body* (with its own H1
consumed as the section title) flows directly under the section heading —
this is where concept/history prose lives.
- **Sidecar / non-README `.md` → `\subsection`.** Title pulled from YAML
frontmatter `title:` if present, else the file stem. The sidecar body flows
under the subsection heading, with any in-body headings demoted by two
levels so they don't compete with the subsection.
- **Nested subdirectories** are flattened into their own top-level `\section`s
with path-style names (e.g. `mixes/v2`). Avoids subsubsection sprawl and
keeps the TOC two levels deep.
### Pipeline
1. **Walk** the tree single-threaded for deterministic ordering. Honor
`.zimignore`. Skip hidden dirs (`.git`, `target`). Group entries by parent
directory; sort directories alphabetically (depth-first), files
alphabetically with `README.md` pulled out as the section's lead content.
2. **Per-file prep:**
- Strip YAML frontmatter (`---`-delimited block at top); capture `title`
for the section/subsection name.
- Parse with `pulldown-cmark`.
- Emit LaTeX via a visitor: headings (demoted), lists → `itemize` /
`enumerate`, code → `Verbatim` (fancyvrb), inline code → `\texttt`, links
→ `\href` from `hyperref`, images → `\includegraphics` if the path
resolves relative to the source file, else alt text in italic.
3. **Document assembly** (Rust, written to a temp dir):
```
\documentclass[11pt]{article}
\usepackage{hyperref, graphicx, fancyvrb, geometry, parskip}
\title{<--title or top-level README H1 or cwd basename>}
\author{<--author or config.default_artist>}
\begin{document}
\maketitle
\tableofcontents
<root dir's files first, no clearpage — they sit under the TOC>
<for each subdirectory, in order:>
\clearpage
\section{<README H1 or dir name>}
<README body, if any>
<for each non-README .md:>
\subsection{<frontmatter title or file stem>}
<converted body>
\end{document}
```
TOC populated automatically from `\section`/`\subsection`.
4. **Compile:** `xelatex -interaction=nonstopmode -halt-on-error
-output-directory=<tmp> doc.tex`, run twice (second pass resolves TOC page
numbers). Stream stderr on failure.
5. **Deliver:** copy `doc.pdf` to `--output` (default `<cwd-name>.pdf`).
Clean up tempdir unless `--keep-tex` is set.
### CLI surface
```
zim pdf [PATH] # default: "."
--output, -o <FILE> # default: <basename(PATH)>.pdf
--title <STR> # default: directory basename
--author <STR> # default: config.default_artist
--keep-tex # leave the .tex file next to the PDF
```
Wired in `src/main.rs` alongside other subcommands; handler at
`src/cli/pdf.rs`. New module `src/pdf/` holds the walk + LaTeX emitter so the
file stays under the ~200-line guideline.
## Domain Events
- **Consumes:** `.md` files on disk, `.zimignore` rules, `~/.config/zim/config.toml`
(for default author).
- **Produces:** one `.pdf` artifact at `--output`. Optionally one `.tex` if
`--keep-tex`. Stdout: progress spinner ("Walking…", "Rendering…",
"Compiling LaTeX (pass 1/2)…"), then final `Wrote <path> (<n> files,
<m> directories)`.
- **What must follow:** none. This is a pure read→write artifact; it does not
mutate sidecars, the index, or config. No event is published for other
commands to react to.
## Checkpoints
1. `zim pdf` in an empty tree errors clearly ("no .md files found").
2. `zim pdf` on a `mixes/` directory containing one `README.md` plus several
track sidecars produces: one section titled from the README's H1, the
README's prose immediately below it, then one subsection per track in
alphabetical order — all on one continuous page block, no internal page
breaks.
3. `zim pdf` on a multi-directory project (`master/`, `mixes/`, `bounces/`)
produces one section per directory, each starting on a fresh page; the
top-level README sits directly under the TOC with no leading clearpage.
4. Sidecar YAML frontmatter does not appear in rendered text; sidecar
`title:` fields drive subsection names.
5. In-body headings inside a sidecar render below their `\subsection` (i.e.
demoted), not at the same visual level.
6. Re-running on the same tree produces a diff-clean PDF — walk order is
deterministic.
7. With MacTeX absent, the error names the missing binary and points to the
install path; exit code is non-zero.