# 49 — Language book end-to-end
The Language book is inkhaven's invented-language
workbench: dictionary, grammar, phonology, sample
texts, and an AI translation flow that round-trips
between your manuscript's working language and a
conlang you defined. This tutorial walks the full
authoring loop from empty project to translated
prose, with a complete worked example you can type
in to see every chapter exercised end-to-end.
The feature ships across 1.2.13 (Phases A through
D.1). Every chord and command below works on that
release.
## Why this exists
Authors building secondary-world novels,
roleplaying-game settings, or worldbuilding
journals routinely need invented vocabulary that
stays self-consistent across the manuscript.
Pre-1.2.13 the options were:
- Keep a parallel `.txt` glossary the editor
doesn't know about.
- Use the existing Artefacts / Places books for
individual words, losing the dictionary
structure.
- Pay for external tools (PolyGlot, ConWorkShop)
that don't talk to your manuscript.
The Language book treats invented languages as
first-class project content — same store, same
backup, same search, same AI integration. Your
manuscript and your dictionary live next to each
other; the AI translator reads the dictionary +
grammar + phonology + sample texts as
authoritative when translating into or out of
the language.
## The five-chapter shape
Every language sub-book carries the same five
chapters. Order matches the order you'll fill
them in:
```
Language
└── <YourLanguage>
├── Meta ← language metadata
│ └── overview ← single HJSON paragraph
├── Dictionary ← words go here
│ ├── A ← alphabet bucket (auto-created)
│ │ └── aiya ← one paragraph per word
│ └── B
│ └── bara
├── Grammar ← rules the AI translator consumes
│ ├── noun-cases
│ └── verb-tense
├── Phonology ← sound rules
│ ├── syllable-template
│ └── vowel-harmony
└── Sample texts ← few-shot anchors
├── greeting
└── short-poem
```
Why split this way:
- **Meta/overview** carries the global facts
(alphabet, word order, morphological type). Read
ONCE per translation prompt.
- **Dictionary entries** are RAG-filtered into the
translation prompt — only entries whose
`translation` appears in the source text get
bundled. Keeps the prompt focused even with
hundreds of entries.
- **Grammar rules** also RAG-filtered via each
rule's `applies_when` field.
- **Phonology rules** are NOT bundled into every
translation prompt (they'd bloat it). Used by
the future phonotactic generator and available
to the LLM on demand.
- **Sample texts** — up to 3 included as register
anchors in every translation.
## The worked example: Tira
Throughout the rest of this tutorial we'll build
a small original conlang called **Tira**. Tira
is small enough to fit in a tutorial but large
enough to exercise every feature: it has 12
consonants + 5 vowels, a CV(C) syllable, two
grammatical cases, and plural marked by reduplication
of the first syllable.
Why a made-up conlang instead of Quenya or
Klingon: those are licensed properties with
existing dictionaries, and the point of this
tutorial is the **process**, not the
**vocabulary**. Substitute your own
secondary-world language as you read.
## Step 1 — scaffold the language
Two equally valid entry points:
**From the TUI** — open the project, focus the
tree pane (`F8`), navigate to the `Language`
system book (or any node already inside it), and
press **`b`** (Add Book). Status bar prompts
`new language — type a name, Enter to scaffold;
Esc to cancel`. Type `Tira`, hit Enter.
Confirmation: `added language `Tira` — 5 chapters
scaffolded; edit Meta/overview to set the
alphabet`.
(Pressing `b` from anywhere else in the tree
still slots a new top-level user book above the
system block — the Language scaffold path only
fires when the cursor is on or inside the
Language system book.)
**From the shell** —
```
$ inkhaven language init Tira
created language book `Tira` at language/tira
· Meta
· Dictionary
· Grammar
· Phonology
· Sample texts
```
Either path produces the same scaffold. Pick
whichever is faster from where your cursor
already is.
## Step 2 — populate `Meta/overview`
Tree pane → navigate to `Language/Tira/Meta/overview`
→ Enter to open in the editor. The body is
already a fully-commented HJSON template with every
field stubbed in. Edit in place; the paragraph's
content type is `[hjson]` so syntax highlighting
shows you the structure.
For Tira, fill the fields like this:
```hjson
{
// ─────────────────────────────
// IDENTITY
// ─────────────────────────────
name: "Tira"
family: "Standalone constructed"
language_kind: constructed
iso_code: ""
// ─────────────────────────────
// ORTHOGRAPHY
// ─────────────────────────────
alphabet: ["A", "E", "I", "K", "L", "M", "N", "O",
"P", "R", "S", "T", "U"]
reading_direction: ltr
script: "Latin"
// ─────────────────────────────
// LINGUISTIC SHAPE
// ─────────────────────────────
word_order: "SOV"
morphology: "agglutinative"
tonal: false
has_cases: true
has_gender: false
// ─────────────────────────────
// RUNTIME / TOOLING
// ─────────────────────────────
stemmer: "" // no off-the-shelf stemmer applies
example_corpus_ref: ""
notes: "Tira is spoken by the river-cult of Atal in the
northern valleys. Formal register only — no informal/casual
forms in the manuscript."
}
```
Save with `F4` (the standard inkhaven save chord).
A few notes:
- `alphabet` only carries the letters Tira actually
uses — `B`, `C`, `D`, etc. are dropped. This drives
the Dictionary's bucket auto-creation in the next
step.
- `word_order`, `morphology`, `has_cases`,
`has_gender` are the quick-reference summary the
AI translator reads BEFORE composing translation
prompts. Tight values here mean better
translations.
- `notes` is freeform — the LLM doesn't read it,
but you do.
## Step 3 — add dictionary entries
Three ways to add a word. Pick whichever fits
your authoring rhythm:
### 3a. The TUI quick path
Tree cursor anywhere under
`Language/Tira/Dictionary` (the chapter itself or
an existing bucket subchapter), press **`+`** (Add
Paragraph). Type the word (`atal`), hit Enter.
The commit handler:
1. Walks the parent chain to identify Tira as the
target language.
2. Derives the alphabet bucket (`A`) from the
word's first character — consulting
`Meta/overview.alphabet` first, falling back to
first-char uppercase.
3. Auto-creates the `A` subchapter under
`Dictionary` if it doesn't exist yet.
4. Creates the entry paragraph under `A`.
5. Seeds the body with the full commented HJSON
template, with `word: "atal"` pre-filled.
Status: `added \`atal\` to Tira/Dictionary/A — open
the paragraph to fill POS / translation`.
Open the new paragraph and fill in:
```hjson
{
// CORE
word: "atal"
type: "noun"
translation: "river"
example: "Atal nan ta-mi sora."
// OPTIONAL — uncomment the fields you need
// examples: []
pronunciation: "/ˈa.tal/"
etymology: "from proto-Tira *a-tal 'flowing water'"
// related: []
inflection: {
nominative: "atal"
genitive: "atale"
plural: "atatal" // first-syllable reduplication
}
// register: ""
// era: ""
// frequency: 0
notes: "Central worldbuilding word — the river-cult takes
its name from this root."
}
```
Save with `F4`. The paradigm forms (`atale`,
`atatal`) are added to the lexicon overlay — when
your manuscript prose contains any of those three
forms, they'll light up italic in mauve-teal.
### 3b. The shell bulk path
Better for adding many words at once or scripting
from a CSV:
```
$ inkhaven language add-word Tira atal \
--type noun \
--translation river \
--example "Atal nan ta-mi sora."
created subchapter `A`
added `atal` to `Tira/Dictionary/A` (noun · river)
$ inkhaven language add-word Tira sora \
--type verb \
--translation flow \
--example "Atal sora-mi."
added `sora` to `Tira/Dictionary/S` (verb · flow)
$ inkhaven language add-word Tira mi \
--type particle \
--translation "(present tense marker)"
added `mi` to `Tira/Dictionary/M` (particle · (present tense marker))
```
Open each paragraph in the editor afterward to
fill the optional fields (paradigms, etymology,
etc.) that the shell command doesn't accept.
### 3c. The bulk-import path — CSV dictionary
The fastest path when you've prepared vocabulary
in a spreadsheet or generated it from another
tool: bulk-import a CSV. This section summarises
the surface; **[Tutorial 50 — Bulk-importing a
dictionary from CSV](50-dictionary-csv-import.md)**
covers the full workflow including `--new`
wipe-and-replace, pre-flight alphabet + phonology
validation, `--force` bypass, CI patterns, and a
ready-to-copy sample CSV
([`49-language-book-tira-starter.csv`](49-language-book-tira-starter.csv)).
```
$ inkhaven language add-word Tira --import tira-starter.csv
imported `atal` → Tira/Dictionary/A
imported `sora` → Tira/Dictionary/S
imported `mi` → Tira/Dictionary/M
imported `nan` → Tira/Dictionary/N
imported `ta` → Tira/Dictionary/T
imported `peli` → Tira/Dictionary/P
imported `kima` → Tira/Dictionary/K
Import summary for `Tira`
imported: 7
```
**CSV format.** Header row maps column names to
row positions, so columns can appear in **any
order** and **any subset** (only the required
ones must be present).
| `word` | yes | invented-language word (becomes the entry slug + lemma) |
| `type` | yes | part of speech (free-form string) |
| `translation` | yes | working-language gloss |
| `example` | no | canonical sample sentence |
| `pronunciation` | no | IPA (`/.../` for phonemic, `[...]` for phonetic) |
| `etymology` | no | derivation note (plain text) |
| `related` | no | `;`-separated word slugs |
| `inflection` | no | `;`-separated `key=value` paradigm pairs |
| `examples` | no | `|`-separated additional sentences |
| `register` | no | formal / informal / literary / archaic / sacred |
| `era` | no | when the word entered the language |
| `notes` | no | freeform usage notes |
**Quoting.** RFC 4180-style — wrap a cell in
`"..."` if it contains commas, quotes, or
newlines. Double an embedded quote: `""`.
**Skip rules:**
- Row with empty `word` cell → skipped silently
(lets you leave blank rows for visual
grouping).
- Row where `word` starts with `#` → comment,
skipped.
- Duplicate `word` (already in the dictionary)
→ skipped with `row N: \`X\` already exists`
warning. Makes re-imports idempotent.
**Worked Tira starter CSV** (`tira-starter.csv`):
```csv
word,type,translation,example,pronunciation,inflection,examples,etymology,notes
```hjson
{
rule_id: "plural-reduplication"
title: "Plural — first-syllable reduplication"
category: morphology
rule: '''
Plural is marked by reduplicating the noun's
first syllable. The first syllable is the
longest CV or CVC sequence at the start of
the word.
atal → atatal (river → rivers)
peli → pepeli (mountain → mountains)
kima → kikima (green → greens)
'''
examples: [
{ source: "rivers", target: "atatal", gloss: "river.PL" }
{ source: "mountains", target: "pepeli", gloss: "mountain.PL" }
]
applies_when: "the source sentence contains a plural noun"
depends_on: []
productivity: "core"
notes: ""
}
```
And a third for verb tense:
```hjson
{
rule_id: "verb-tense-particles"
title: "Verb tense via particles"
category: syntax
rule: '''
Tira verbs don't inflect for tense. Tense is
marked by a particle placed immediately AFTER
the verb:
-mi present
-lo past
-sa future
Tira is SOV, so the order is:
SUBJECT OBJECT VERB-TENSE
atal nan ta-mi sora. (the river flows you)
actually: river you-OBJ flow-PRES
→ "you make the river flow", roughly.
'''
examples: [
{ source: "the river flows", target: "atal sora-mi", gloss: "river flow-PRES" }
{ source: "the river flowed", target: "atal sora-lo", gloss: "river flow-PAST" }
]
applies_when: "the source sentence contains a verb"
depends_on: []
productivity: "core"
notes: ""
}
```
The `rule` field is a multi-line HJSON string —
use `'''` to open and close. Indent doesn't
matter to the parser; readability is the goal.
The `examples` array is **few-shot data** —
during translation, the LLM sees these as worked
examples of the rule applied. More examples →
better translations.
## Step 5 — define phonology rules
Tree cursor on `Language/Tira/Phonology` → `+` →
type `syllable-template` → Enter.
```hjson
{
rule_id: "syllable-template"
title: "Syllable template — CV(C)"
category: phonotactics
rule: '''
Tira syllables follow the template CV(C):
ONSET: exactly one consonant (no clusters)
NUCLEUS: exactly one vowel (no diphthongs)
CODA: zero or one consonant, only from
{l, n, r, s}
Examples:
a-tal (V.CVC — onset can be null word-initial)
pe-li (CV.CV)
ki-ma (CV.CV)
a-ta-tal (V.CV.CVC — reduplicated plural)
'''
examples: [
{ input: "atal", output: "/ˈa.tal/", gloss: "river" }
{ input: "peli", output: "/ˈpe.li/", gloss: "mountain" }
]
exceptions: []
register: ""
notes: "Word-initial vowels are allowed (null onset)."
}
```
And the consonant inventory:
```hjson
{
rule_id: "consonant-inventory"
title: "Consonant inventory"
category: consonants
rule: '''
Tira has 12 consonants:
Stops: p t k
Nasals: m n
Fricatives: s
Liquids: l r
Glides: (none)
Voiced stops (/b/, /d/, /g/) do NOT appear in
native vocabulary; loan words may be respelled
with the voiceless equivalents.
'''
examples: []
exceptions: []
notes: "Symbol inventory matches IPA except where noted."
}
```
## Step 6 — add sample texts
Tree cursor on `Language/Tira/Sample texts` → `+`
→ type a title (`river-greeting`) → Enter.
Sample-text paragraphs are NOT seeded with a
template — they're free-form prose. Write a
short text in Tira with a gloss on the next line:
```
Atal nan ta-mi sora.
"The river flows for you." (literally: river you-OBJ flow-PRES)
```
Add 2-3 more so the translation prompt has
register variety:
```
Pepeli kima-mi.
"The mountains are green." (mountain.PL green-PRES)
Atatale sora-lo.
"It flowed of the rivers." (river.PL.GEN flow-PAST)
```
The translation prompt envelope picks up to 3
sample texts as register anchors.
## Step 7 — see the overlay in your manuscript
Open any user-book paragraph and write prose that
includes a Tira word. For example, in a chapter:
> The traveller knelt beside the **atal** and
> washed the dust of the road from her hands.
> Above her, the **pepeli** caught the last light
> of the setting sun.
`atal` and `pepeli` light up in italic
`theme.language_word_fg` (default mauve-teal
`#b4a8e1`). Move the cursor onto `atal` and the
editor footer chip reads:
```
[atal · noun · river]
```
— the lemma, part of speech, and translation lifted
live from the entry's HJSON. Move off → chip
disappears, goal gauge (if any) comes back.
The overlay catches paradigm forms too: `atale`
(genitive) and `atatal` (plural) light up the same
way because they're listed in the entry's
`inflection` field. This is the closing of the
"Snowball gap" for invented languages — no
off-the-shelf stemmer knows Tira, but the
inflection paradigm tells the lexicon walker
which forms to recognise.
## Step 8 — translate INTO Tira
Cursor in a user-book paragraph (the source), press
**`Ctrl+B Q`**.
Single-language project → translation kicks off
directly. Multi-language project → picker pops
showing every defined language; use `↑↓` + `Enter`
or just type the first letter (`t` for Tira) to
jump-and-commit.
The composer assembles the prompt envelope:
1. **System prompt** — explains the LLM's role as
a translator between the working language and
Tira.
2. **Meta/overview** — Tira's identity, alphabet,
word_order, morphology.
3. **Grammar rules** — all three you wrote
(`noun-cases`, `plural-reduplication`,
`verb-tense-particles`).
4. **Phonology rules** — `syllable-template`,
`consonant-inventory`.
5. **Dictionary** — RAG-filtered to entries whose
`translation` appears in the source paragraph.
6. **Sample texts** — first 3 paragraphs from
`Sample texts`.
7. **Source paragraph** — the actual prose.
The envelope size at this scale runs about 3-5K
tokens — comfortably within any modern model's
window.
The AI pane title shows `translate[on]` in italic
mauve-teal while the stream is in flight so you
know the `I` apply chord will use translation
extraction.
The LLM responds with:
```
<<<TRANSLATION>>>
Atal sora-mi nane ta. Pepeli kima-mi, sora-lo ka-mi sora.
<<<END>>>
Per-token gloss:
| river | river.NOM | atal |
| flows | flow.PRES | sora-mi |
| for you | you.OBJ | nane ta |
| mountains | mountain.PL | pepeli |
| are green | green.PRES | kima-mi |
Applied rules:
- noun-cases (atal.NOM, nane.GEN→OBJ via context)
- plural-reduplication (pepeli)
- verb-tense-particles (sora-mi, kima-mi)
Confidence flags:
- "for you" — Tira doesn't distinguish dative from
accusative; rendered with OBJ particle `ta`.
Suggest adding entry `ta: dative/object marker`
if not already.
```
Press **`I`** in the AI pane. The Insert chord
lifts ONLY the `<<<TRANSLATION>>>` block at your
cursor — the gloss table + applied-rules list +
confidence flags stay in the AI pane for your
reference but don't pollute the manuscript.
If the LLM forgot the markers, a second `I` press
falls back to inserting the full body verbatim.
## Step 9 — reverse-translate
**`Ctrl+B Shift+Q`** translates FROM Tira back to
the working language. Same envelope shape, flipped
direction labels.
The natural roundtrip workflow:
1. Cursor on an English paragraph → `Ctrl+B Q` →
land the Tira translation in the next paragraph
via `I`.
2. Cursor on the Tira paragraph you just landed →
`Ctrl+B Shift+Q` → AI pane shows the
back-translation.
3. Compare against the original.
When the back-translation drifts beyond register
(e.g., "the river flows" → "the river of regal
pomp"), the grammar rules or dictionary entries
have an inconsistency the manuscript will
eventually trip over. This is the in-TUI version
of what'll eventually be the headless `inkhaven
language test` corpus-driven drift detector.
## Step 10 — health check
```
$ inkhaven language doctor Tira
Language doctor — `Tira`
name : Tira
kind : constructed
family : Standalone constructed
alphabet : 13 entries
direction : ltr
Chapters
Dictionary : 7 parseable entries
Grammar : 3 rules
Phonology : 2 rules
Sample texts : 3 samples
Dictionary coverage
with example : 7/7 (100%)
with paradigm : 3/7 (42%)
missing paradigm: 4 (overlay won't catch inflected forms)
Manuscript gap analysis
unique words (≥2 chars) in manuscript prose: 412
covered by dictionary: 2/412 (0%)
uncovered words (sample, max 15):
· above
· and
· beside
· …
```
The gap analysis is honest about coverage —
Tira covers two manuscript words because Tira is
a new language sparsely used in prose. The numbers
go up as you add vocabulary and weave Tira into
more passages.
For CI / shell scripting, pass `--json`:
```
```
Use this to gate releases: e.g., refuse to merge a
PR that drops paradigm coverage below 80%.
## Step 11 — list and export
`inkhaven language list` summarises every defined
language at a glance:
```
$ inkhaven language list
name words grammar phonology samples
------------------------------------------------
Tira 7 3 2 3
```
When you're ready to publish or share:
```
# Two-column printable Typst dictionary
$ inkhaven language export Tira \
--format dictionary-twocol \
--output dist/tira-dict.typ
# Anki / SuperMemo / Mochi flashcard deck
$ inkhaven language export Tira \
--format anki \
--output dist/tira.csv
# Full structured JSON for downstream tooling
$ inkhaven language export Tira --format json > dist/tira.json
```
The Typst output renders entries grouped under
alphabet headers (`— A —`, `— B —`, …), each entry
formatted as bold headword + italic POS +
translation + indented example + small-font
paradigm line. Compile with `typst compile
dist/tira-dict.typ` to get a printable PDF.
## Authoring rhythm — what to do in what order
The fastest path from "I have an idea for a
language" to "the AI is translating my prose into
it":
1. **Sketch the linguistic shape first.**
Open `Meta/overview` and fill `word_order`,
`morphology`, `has_cases`, `has_gender`,
`alphabet`. This takes 5 minutes and sets the
constraints the rest of the workflow operates
within.
2. **Write 2-3 grammar rules.** The minimum
useful set: a case system (or word-order rule),
a tense system (or aspect rule), a number-marking
rule. Without these the AI translator will
guess randomly.
3. **Seed 10-20 dictionary entries** for words
your manuscript actually uses. Don't try to
pre-populate every conceivable word — Tira
grew its vocabulary as the manuscript needed
it. The `doctor` gap analysis tells you which
prose words are uncovered.
4. **Write 3-5 sample texts.** These are the
register anchors the LLM uses to pitch its
output. Keep them in the register you want the
manuscript translations to match.
5. **Translate, review, adjust.** Most issues
surface during translation:
- LLM produces an off-register translation →
adjust sample texts.
- LLM mis-applies a grammar rule → make
`applies_when` tighter or add more
`examples`.
- LLM asks for a missing word → `add-word` the
entry it suggests.
6. **Roundtrip-test periodically.** `Ctrl+B Q`
→ `Ctrl+B Shift+Q` → compare to original.
Drift = rule inconsistency = manuscript bug
waiting to happen.
## Cheat sheet
| Scaffold a new language (TUI) | Tree (`F8`) → cursor on `Language` → `b` |
| Scaffold a new language (shell) | `inkhaven language init <name>` |
| Add a dictionary entry (TUI) | Cursor under `<lang>/Dictionary` → `+` → type word → Enter |
| Add a dictionary entry (shell) | `inkhaven language add-word <lang> <word> --type <pos> --translation <text>` |
| Bulk-import a dictionary (CSV) | `inkhaven language add-word <lang> --import <path.csv>` |
| Remove a dictionary entry | `inkhaven language remove-word <lang> <word>` |
| Add a grammar / phonology rule (TUI) | Cursor under `<lang>/Grammar` or `<lang>/Phonology` → `+` → type rule_id → Enter |
| Translate INTO the language | `Ctrl+B Q` in editor |
| Translate FROM the language | `Ctrl+B Shift+Q` in editor |
| Insert translation at cursor | `I` in AI pane (lifts only the `<<<TRANSLATION>>>` block) |
| Health report (text) | `inkhaven language doctor <lang>` |
| Health report (JSON, CI-friendly) | `inkhaven language doctor <lang> --json` |
| List defined languages | `inkhaven language list` |
| Export | `inkhaven language export <lang> --format <fmt> --output <path>` |
## Common pitfalls
- **"My entry shows `= aag\n\n` instead of the
template."** You scaffolded on a pre-`90e51d7`
build; the seed body wrote to bdslib but not to
disk. Delete the entry via `remove-word` and
re-create it; new entries write through to disk
correctly.
- **"The lexicon overlay doesn't light up inflected
forms."** The entry's `inflection: {...}` map
is empty or missing. Fill the paradigm — every
value in that map gets added to the lexicon as
an extra surface form.
- **"The translation is grammatically wrong."**
Check the rule's `applies_when` — if it's too
vague, the rule fires when it shouldn't. Or
add more `examples` so the LLM has more
worked-pattern data.
- **"`Ctrl+B Q` shows zero matching dictionary
entries."** The RAG filter matches the entry's
`translation` field against words in the source.
If your entries' translations are multi-word
phrases, the source has to contain the entire
phrase verbatim (case-insensitive substring).
- **"Pressing `b` from anywhere creates a top-level
book, not a language sub-book."** The Language-
scaffold path only fires when the cursor is on
or inside the Language system book. Move the
cursor onto `Language` first.
## What's not in 1.2.13
Follow-up candidates (the §12 / §13 / §14 parts of
`Documentation/PROPOSALS/LANGUAGE_BOOK.md` that
didn't ship):
- `--format grammar` and `--format phrasebook`
exports — need rule HJSON schema design (the
current template is the right shape; the
exporter doesn't yet parse it).
- `inkhaven language test <name>` headless
roundtrip drift CLI.
- `inkhaven language translate` headless
translation CLI.
- `Ctrl+B Shift+R` reverse-lookup picker
("find the entry whose translation is `X`").
- `Ctrl+B Shift+W` word-of-the-day floating
card in the manuscript editor + phonotactic
generator in the Language book.
- Card renderers for Dictionary / Grammar /
Phonology paragraphs viewed inside the Language
book (the §7 / §10 visualisations from the
proposal).
The plumbing for all of these is in place; they're
chord / render work, not data-model work.