# 40 — Concordance view
`Ctrl+B Shift+L` opens a project-wide concordance:
every distinct word in your manuscript, ranked by
occurrence count, with up to three keyword-in-context
(KWIC) samples per row.
```
┌─ Concordance — project-wide ─────────────────────────┐
│ 1,247 distinct · 19,432 tokens · 84 paragraphs │
│ filter: walk│ sort: count (3 shown) │
│ # word count variants │
│ 1 walked 18 (walk, walking)
│ ▶ 2 walking 11 │
│ 3 walk 7 │
│ │
│ samples for "walked" (18× total) │
│ aerin/chapter-1:l3 …Anna «walked» away into… │
│ aerin/chapter-2:l8 The boy «walked» a hundred… │
│ khaal/chapter-1:l1 He «walked» as if the stones… │
│ │
│ ↑↓ navigate · type to filter · Ctrl+S sort · Esc │
└──────────────────────────────────────────────────────┘
```
## What the pipeline does for you
When you open the modal, inkhaven:
1. Walks every Paragraph node in the project's
hierarchy.
2. Strips the leading typst heading line so the
title doesn't inflate counts.
3. Tokenises with `unicode-segmentation`'s UAX-#29
word segmenter — Cyrillic / Latin / Greek /
Devanagari boundaries handled identically.
4. Drops single-character tokens, pure-digit runs,
and stop-words. (Stop-words reuse the
`editor.style_warnings.repeated_phrases.<lang>_stop_words`
list — same multilingual setup as the repeated-
phrase detector, no second list to tune.)
5. Stems each surviving token with the project's
Snowball algorithm. `walk`/`walked`/`walking`
all key on a single stem; Russian `сказал`/
`сказала`/`сказали` collapse the same way.
Disable via
`editor.style_warnings.repeated_phrases.use_stemming
= false`.
6. Groups by stem-key, counts, and captures the
first three encountered occurrences as KWIC
samples (32-char half-width, `«match»` wrapping
for visual contrast in monospace output).
7. Sorts by count descending; ties broken by
headword ascending.
## Interactions
Inside the modal:
| Any printable character | Append to filter (substring match on headword + variants) |
| `Backspace` / `Delete` | Edit the filter buffer |
| `Ctrl+S` | Toggle sort: count ↔ alphabetical |
| `↑` `↓` / `PgUp` / `PgDn` | Navigate |
| `Home` / `End` | Jump to first / last visible row |
| `Esc` | Close |
Plain `s` types into the filter — it doesn't toggle
sort. Use `Ctrl+S` so headwords starting with `s`
stay typeable.
## Filter semantics
The filter substring-matches against the headword
**and** every kept variant. So typing `walk`
surfaces the entry whose headword is `walked` (the
most-common surface form for that stem) but whose
variants list includes `walking` and `walk`.
Type more characters to narrow further — `walke`
narrows the same entry down further; `walker` would
land on a different stem entirely.
## Memory + performance
- Max 3 samples + 5 variants per stem, so even a
Tolstoy-scale corpus stays bounded.
- Build completes well under a second on a 100k-word
manuscript; the work runs synchronously on the UI
thread.
- Filter + sort changes are instant after the
initial build — `visible` is a cached `Vec<usize>`
into `data.entries`.
## Use cases
- **Auditing your prose's lexical fingerprint**.
The top of the list is your *real* voice — not
the voice you imagine. Surprises are common.
- **Spotting overworked vocabulary**. When `glanced`
shows up 47 times across the book, it's time to
swap in `peered` / `looked up` / `caught sight of`.
- **Locating a character or place mention**. Type
the name in the filter; KWIC samples show every
place it appears with breadcrumb-style slug paths
(`aerin/chapter-2:l8`) — faster than full-text
search if you already know the word.
## See also
- [`03-the-editor.md`](03-the-editor.md) — inline
filter-word + repeated-phrase overlays (related
style-checking layer that's always-on while you
edit).
- [`16-similar-paragraphs.md`](16-similar-paragraphs.md)
— vector-similarity picker (semantic version of
"find me where I wrote about X").
## 1.2.11 additions
- **Enter jumps to the source paragraph.** Pressing
Enter on a concordance row closes the modal and
opens the source paragraph at the first sample's
editor line. The heading offset is computed
against the live editor body so the cursor lands
on the right textarea row (the index is built
over heading-stripped bodies; the editor shows the
raw paragraph with the `= title` line).
- **System books excluded from the corpus.** System
books (Prompts, Characters, Places, Lore, Help,
Notes, Artefacts, Typst, Scripts) are skipped at
index-build time. Two reasons: their content is
metadata, not prose — counting them dilutes the
lexical signal; and their bodies aren't always
reachable through the on-disk path (prompts-editor
saves to bdslib only), so an Enter-jump to one
would fail. Filtering them out at index time
eliminates the broken nav case.
## 1.2.12 additions
- **`inkhaven export-concordance` CLI.** Same
data the `Ctrl+B Shift+L` modal shows, written
to disk for use in spreadsheets / analysis
pipelines. Two formats:
```
inkhaven export-concordance --output stems.csv
inkhaven export-concordance --format json --output stems.json
```
CSV is one row per stem with
`headword,stem,count,variants,sample_paths`
columns. JSON is the structured form including
KWIC snippets, line numbers, full variants list,
project-wide totals. Optional `--min-count
<N>` flag drops long-tail single-occurrence stems
below the threshold from the export — useful for
filtering noise on big manuscripts. Same
multilingual Snowball-stemmed plumbing as the
modal; same system-book exclusion.