Module cleanup

Source

Structs§

RewritePrompt: The constrained-rewrite prompt for the LLM formatter (consumed by the Candle façade in T7). system is hard restraint that holds at every level; the per-level rule only widens which edits are permitted. Restraint is the wording, so it lives here in the pure core, not in the inference façade.

Enums§

Level: Cleanup intensity. Plan 3 wires this into the LLM rewrite; deterministic-Light is the instant, always-present layer.

Functions§

apply_backtrack: Remove a self-correction: when a backtrack trigger appears AS A WHOLE PHRASE, drop the words immediately preceding it (the spec’s >3-word-reduction guard: only fire when at least 3 words precede the trigger, so we don’t nuke a short true clause). Word-bounded so it never deletes content words it matched inside.
apply_spoken_commands: Apply spoken formatting commands deterministically. Padding the input with spaces lets a command at the phrase start or end match too (the replacements are space-delimited). Note: back-to-back identical commands (“new line new line”) collapse to one — an accepted Plan-1 edge case.
decapitalize_continuation: Lowercase the first letter of text when it CONTINUES the previous block — the previous block didn’t end a sentence (no terminal .!?) AND the first word is an allow-listed continuation word. Whisper cases each segment as a fresh sentence; this undoes the spurious mid-sentence capital when a sentence spans a pause. Conservative by construction (only the allow-list; never a proper noun).
deterministic_light: Deterministic “Light”: capitalize sentence starts, ensure terminal punctuation, strip leading fillers. Always guard-safe by construction.
format_revise: Format a pass-2 Whisper revise. Whisper already cased + punctuated, so this does NOT re-capitalize sentence starts or force terminal punctuation (that re-creates the per-segment mid-sentence capital). It applies only the spoken-command and scratch that backtrack features and the continuation de-capitalizer.
guard_accepts: The moat: accept a rewrite only if it preserves every content word from the input, in order, adding/removing nothing but allowed fillers. Guards harm (a substituted/dropped meaning word) rather than edit volume.
paragraphize: Group a long monologue into readable paragraphs (the High level). Existing blank lines (a spoken “new paragraph”) are preserved as hard breaks; within each run, a new paragraph starts at a sentence that opens with a discourse marker once the current paragraph already holds MIN_SENTENCES_PER_PARA sentences, or unconditionally once it reaches MAX_SENTENCES_PER_PARA — so breaks fall at thought shifts, never after every sentence, and no paragraph runs on forever.
parse_level: Parse a config string into a Level (defaults to Light — the safe, restrained default — on anything unrecognized).
rewrite_prompt: Build the per-level rewrite prompt for T7’s Candle façade. The Light rule keeps filler removal to LEADING disfluencies only — mid-sentence you know/i mean removal is deliberately NOT requested, because the content-word guard would accept such drops (see the FILLERS note). T7 must preserve this restriction.
shape_entry: Apply whole-entry shaping for a level (called at session end on the joined clean text). High paragraphizes; lower levels pass through (per-phrase Light already ran).
strip_sound_tags: Remove Whisper’s non-speech tags. [...] spans go only when a known tag or an all-caps event shape ([BLANK_AUDIO]); (...) spans go only when every inner word is a sound word. Runs in the pre-layer (before the content-word guard), so nothing it removes ever reaches the guard as a content-word change.

Module cleanup

Module cleanup Copy item path

Structs§

Enums§

Functions§

Module cleanup