Module translator

Expand description

Translator — cross-vocab token-stream pipe.

Take Agent A’s token IDs in vocab V_A, produce Agent B’s token IDs in vocab V_B, with no text ever leaving the process. Internally:

    ids_A → Detokenizer(V_A) → utf8 → BPETokenizer(V_B) → ids_B

The text intermediate is purely local; agent-to-agent traffic still carries only token IDs on the wire. Mirrors the TS Translator class from @codecai/web and the Python Translator from codecai — same word-boundary buffering rules.

Streaming caveat: BPE merges depend on context, so re-tokenizing partial words mid-stream produces different IDs than re-tokenizing the complete word. The Translator buffers text until a safe boundary (whitespace) before flushing through BPE. Pass partial=true for incoming chunks and partial=false (or call Translator::finish) on the last chunk so the buffer drains.

Structs§

Translator: Cross-vocab agent-handoff pipe.

Functions§

static_translation_table: Build a static V_A → V_B[] translation table by rendering each V_A vocab entry to text and re-tokenizing through V_B.
translate_one_shot: One-shot translator for non-streaming uses where all IDs are in hand.