inkhaven 1.2.21

Inkhaven — TUI literary work editor for Typst books
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
# 50 — Bulk-importing a dictionary from CSV

Tutorial 49 covers the Language book end-to-end.
This one zooms in on `inkhaven language add-word
--import` — the bulk-loader for dictionaries
prepared in a spreadsheet, generated from
another tool, or pre-validated in CI.

A sample CSV ships alongside this tutorial:
[`49-language-book-tira-starter.csv`](49-language-book-tira-starter.csv).
Copy it, point it at a scaffolded Tira sub-book,
and the rest of this tutorial walks through
what'll happen.

## When CSV import is the right tool

Use CSV import when:

- You're seeding a project with vocabulary
  pre-built in a spreadsheet (Excel, Numbers,
  Google Sheets, LibreOffice).
- You're generating vocabulary from a script
  (corpus extraction, LLM-generated wordlists,
  conversion from another conlang format).
- You want CI-style "dictionary as source of
  truth" — the .csv lives in version control;
  `--new` re-imports it on every regen.
- You're seeding 20+ words and the per-entry TUI
  `+` chord becomes tedious.

Skip CSV import when:

- You're adding 1-2 words mid-writing — use the
  TUI `+` chord; you'll be in the editor anyway
  to fill the entry's prose notes.
- You need full per-entry HJSON fidelity
  (custom inflection paradigm names, deeply
  nested examples).  The CSV's `inflection`
  column handles flat key=value pairs; richer
  structures need hand-editing post-import.

## The CSV format

Header row drives column mapping.  Column names
are case-insensitive and order-independent.

| Column | Required? | Format |
|-|-|-|
| `word` | yes | invented-language word (becomes the entry slug + lemma) |
| `type` | yes | part of speech (free-form string) |
| `translation` | yes | working-language gloss |
| `example` | no | canonical sample sentence |
| `pronunciation` | no | IPA (`/.../` phonemic, `[...]` phonetic) |
| `etymology` | no | derivation note (plain text) |
| `related` | no | `;`-separated word slugs |
| `inflection` | no | `;`-separated `key=value` paradigm pairs |
| `examples` | no | `|`-separated additional sentences |
| `register` | no | formal / informal / literary / archaic / sacred |
| `era` | no | when the word entered the language |
| `notes` | no | freeform usage notes |

**Quoting.**  RFC 4180-style.  Wrap cells in
`"..."` when they contain commas, quotes, or
newlines.  Double an embedded quote: `""`.

**Why `;` and `|`** for complex fields:
- `inflection` and `related` use `;` because
  paradigm values and word slugs almost never
  contain semicolons.
- `examples` uses `|` because example sentences
  frequently contain commas, and `|` is rarely
  punctuation inside a sentence.

**Skip rules:**
- Row with empty `word` cell → silent skip
  (useful for visual grouping rows).
- Row where `word` starts with `#` → comment,
  skipped.
- Duplicate `word` (already in the dictionary
  before this import) → skipped with warning;
  makes re-imports idempotent.

## Walk-through with the Tira starter CSV

The bundled sample defines 12 Tira entries: 7
core nouns / verbs / particles + 5 ritual
vocabulary words from the river-cult register.

### Step 1 — scaffold Tira

```
$ inkhaven language init Tira
created language book `Tira` at language/tira
  · Meta
  · Dictionary
  · Grammar
  · Phonology
  · Sample texts
```

### Step 2 — populate Meta/overview.alphabet

Open `Language/Tira/Meta/overview` and set the
alphabet field to match the letters your
dictionary uses.  For the sample CSV:

```hjson
{
  // ...
  alphabet: ["A", "E", "I", "K", "L", "M", "N", "O",
             "P", "R", "S", "T", "U"]
  // ...
}
```

These are the letters every word in the sample
CSV uses (plus a couple held in reserve).  If you
skip this step, the import pre-flight will warn
that words contain characters not in the alphabet
(because the empty-default `["A", ..., "Z"]`
allows them, but a tight alphabet is good
discipline).

### Step 3 — (optional) declare a phonology

To exercise the phoneme-inventory validation,
add a Phonology rule via the TUI (`+` under
`Language/Tira/Phonology` → type
`consonant-inventory`):

```hjson
{
  rule_id: "consonant-inventory"
  category: "consonants"
  rule: '''
    Tira has 8 consonants:
      Stops:     p k t
      Nasals:    m n
      Fricative: s
      Liquids:   l r
  '''
  phonemes: ["p", "k", "t", "m", "n", "s", "l", "r"]
}
```

Without the `phonemes` array populated, phonology
validation skips silently (alphabet validation
still runs).  A future release will let `language
doctor` cross-check this against the actual
dictionary for completeness.

### Step 4 — import the CSV

```
$ inkhaven language add-word Tira \
    --import Documentation/Tutorials/49-language-book-tira-starter.csv
imported `atal` → Tira/Dictionary/A
imported `sora` → Tira/Dictionary/S
imported `mi`   → Tira/Dictionary/M
imported `nan`  → Tira/Dictionary/N
imported `ta`   → Tira/Dictionary/T
imported `peli` → Tira/Dictionary/P
imported `kima` → Tira/Dictionary/K
imported `mora` → Tira/Dictionary/M
imported `samu` → Tira/Dictionary/S
imported `lo`   → Tira/Dictionary/L
imported `sa`   → Tira/Dictionary/S

Import summary for `Tira`
  imported:        11
  skipped (#):     1
```

The comment row (`# the words below are
vocabulary for the river-cult's ritual
vocabulary`) was skipped because it starts with
`#`.

### Step 5 — verify

```
$ inkhaven language list
  name      words  grammar  phonology  samples
  ------------------------------------------------
  Tira         11        0          1        0

$ inkhaven language doctor Tira
Language doctor — `Tira`
...
Chapters
  Dictionary     : 11 parseable entries
  Grammar        : 0 rules
  Phonology      : 1 rules
  Sample texts   : 0 samples

Dictionary coverage
  with example   : 7/11 (63%)
  with paradigm  : 7/11 (63%)
  missing example: 4
  missing paradigm: 4 (overlay won't catch inflected forms)
...
```

Open any imported entry (e.g.
`Language/Tira/Dictionary/A/atal`) — it renders
as syntax-highlighted HJSON with every field the
CSV provided, no commented-out stubs (the import
path uses a compact body builder distinct from
the interactive seed template):

```hjson
{
  word:         "atal"
  type:         "noun"
  translation:  "river"
  example:      "Atal nan ta-mi sora."
  examples: [
    "Atal sora-mi."
    "Pelele atal-e."
  ]
  pronunciation: "/ˈa.tal/"
  etymology:    "from proto-Tira *a-tal 'flowing water'"
  related:      ["atale", "atatal"]
  inflection: {
    genitive: "atale"
    nominative: "atal"
    plural: "atatal"
  }
  register:     "formal"
  notes:        "Central worldbuilding word — the river-cult takes its name from this root"
}
```

Add the remaining fields (`era`, `frequency`,
`notes` on the entries that didn't have them) by
opening each paragraph and editing the HJSON.

## --new — wipe-and-replace semantics

The default import is **additive**: existing
entries are kept (duplicates skipped); new rows
are added.  Useful for incremental updates.

Pass `--new` to make the import **wipe-and-
replace**: every existing paragraph + bucket
subchapter under `Dictionary` is deleted before
the CSV is read.  The Dictionary chapter itself
is preserved.

```
$ inkhaven language add-word Tira --import tira-v2.csv --new
--new: wiped 11 existing entries across 7 buckets from `Tira/Dictionary`
imported `atal` → Tira/Dictionary/A
...
Import summary for `Tira`
  imported:        15
```

Use `--new` when:
- The CSV is the source of truth (version-
  controlled; re-imported on every regen).
- You want to drop typos or schema-evolution
  artefacts from earlier import passes.
- You're scripting "regenerate the dictionary
  from upstream" in CI.

**Validation ordering:** pre-flight runs BEFORE
the wipe, so a bad CSV doesn't destroy your
existing dictionary then fail to populate the
replacement.  Belt-and-braces.

## Pre-flight validation

Before any writes — including the `--new` wipe
— the import pre-flight pass walks every CSV
row and validates each `word` against:

1. **The alphabet** — every non-whitespace,
   non-punctuation character of every word must
   appear in some entry of
   `Meta/overview.alphabet`.  Skipped when the
   alphabet list is empty.

2. **The phoneme inventories** — the union of
   every Phonology rule's `phonemes` field.
   Skipped when no Phonology rule declares
   `phonemes`.

If any word fails validation, the entire import
is aborted with a per-violation report:

```
$ inkhaven language add-word Tira --import bad.csv
Pre-flight validation failed — 2 violation(s) found:

  · row 3: `xeno` contains `x` not in Meta/overview.alphabet
  · row 5: `zara` contains `z` not in Meta/overview.alphabet

Fix by either:
  · updating Meta/overview.alphabet to include the missing characters, OR
  · updating a Phonology rule's `phonemes` list to include them, OR
  · correcting the CSV, OR
  · re-running with --force to bypass validation.

Error: import aborted — 2 alphabet/phonology violation(s)
```

**Why hard-stop rather than warn:**

A partial import would leave the dictionary in
a confused state (some valid entries from the
CSV imported, the rest missing); the gap
analysis in `language doctor` would then
misreport coverage.  Hard-stop keeps the
dictionary in a known-good state.

**Bypass with `--force`** when:
- You're intentionally importing words that
  exceed the current Meta/overview declaration
  (e.g. you're seeding the alphabet FROM the
  CSV — the alphabet check would refuse,
  defeating the purpose).
- You're importing loanwords that use phonemes
  outside the native inventory (Tira loans
  `samu` from `Atal-Kele` using the same
  consonant set — no `--force` needed; a
  language with truly alien borrowings would).
- You know the validation is wrong (typo in
  alphabet; phoneme inventory deliberately
  incomplete during the language's design
  phase).

```
$ inkhaven language add-word Tira --import loanwords.csv --force
imported `xena` → Tira/Dictionary/X    # 'x' not in alphabet, but --force overrode
...
```

## Combining flags

The three flags compose:

| Flags | Behaviour |
|-|-|
| `--import <path>` | Validate; if clean, additive import (skip duplicates) |
| `--import <path> --new` | Validate; if clean, wipe Dictionary then import |
| `--import <path> --force` | Skip validation; additive import |
| `--import <path> --new --force` | Skip validation; wipe Dictionary then import |

`--force` always implies skipping the pre-flight
check; `--new` always implies the wipe step.
They're orthogonal.

## CSV authoring tips

**Build in a spreadsheet first.**  Excel /
Numbers / Google Sheets / LibreOffice all
export to UTF-8 CSV cleanly.  Build the
dictionary in a spreadsheet (with columns
matching the schema names exactly — case
doesn't matter), then export.  Reshaping after
export is harder than reshaping in the
spreadsheet.

**Reserve `|` and `;` characters carefully.**
The CSV cell parser treats them as
sub-separators within the `examples` (pipe)
and `inflection` / `related` (semicolon)
columns.  If your prose actually contains them,
use a different sentence format.

**Comment liberally.**  `#`-prefixed rows are
free.  Use them to group vocabulary by domain
(`# kinship terms`, `# colours`, `# verbs of
motion`) — the dictionary stays organised AND
the spreadsheet stays readable.

**Re-imports are idempotent.**  Edit the CSV,
re-import — existing entries are skipped as
duplicates, new ones are added.  This is the
"adding more entries over time" workflow.

**For wipe-and-replace workflows, version-
control the CSV.**  Commit `tira-dict.csv`
alongside the manuscript; CI step regenerates
the dictionary on every push:

```yaml
# .github/workflows/dict.yml
- name: regenerate Tira dictionary
  run: |
    inkhaven language add-word Tira \
      --import tira-dict.csv --new
- name: verify health
  run: |
    inkhaven language doctor Tira --json | \
      jq -e '.coverage.with_paradigm_pct >= 80'
```

## Round-tripping with export

Phase D ships `inkhaven language export <lang>
--format <fmt>` — `json` and `anki` and
`dictionary-twocol`.  None of them currently
round-trip back to the import CSV format, but
the `json` export is structurally complete:

```
$ inkhaven language export Tira --format json | \
    jq '.dictionary[] | [.word, .type, .translation] | @csv' -r
"atal","noun","river"
"kima","adjective","green"
...
```

Future candidate: `--format csv` that emits
import-compatible CSV so the round-trip is
fully closed.  For now, this `jq` recipe
covers the common case.

## Cheat sheet

| Action | Command |
|-|-|
| Additive import | `inkhaven language add-word <lang> --import <path.csv>` |
| Wipe-and-replace | `inkhaven language add-word <lang> --import <path.csv> --new` |
| Skip alphabet / phonology validation | `... --force` |
| Comment a row | put `#` in the `word` cell |
| Skip a row | leave the `word` cell empty |
| Inflection format | `nominative=atal;genitive=atale;plural=atatal` |
| Multiple examples | `Sentence one.\|Sentence two.\|Sentence three.` |
| Related entries | `atale;atatal;pelele` |
| Re-import idempotency | safe — existing entries skipped as duplicates |