Skip to main content

Crate inputx_dict_format

Crate inputx_dict_format 

Source
Expand description

inputx-dict-format — IDFv1 binary dict format for IME engines.

Probability-native (Q4 fixed-point log priors per inputx_scoring), mmap zero-copy reader, deterministic writer. Same binary layout across pinyin / wubi / Japanese / future Korean and Vietnamese engines.

§Architecture (per .claude/PLAN-dict-format-IDFv1.md)

+---------+---------------+--------------+---------------+----------------+
| Header  | String pool   | Entry table  | FST code idx  | FST word idx   |
| 64 B    | varlen, pad8  | N × 16 B     | varlen        | varlen         |
+---------+---------------+--------------+---------------+----------------+
                                         | Bigram block (optional)        |
                                         | Embedding block (optional)     |
                                         | Padding to 8-byte EOF          |
                                         +--------------------------------+
  • Header: magic b"IDFv", format_version (currently 1), section offsets, sha256 of payload.
  • String pool: deduplicated UTF-8 with byte offsets.
  • Entry table: fixed 16 B per entry; carries word_offset (u24), code_offset (u24), log_prior (i16 Q4), match_type (u8), flags (u8), bigram_offset (u32, 0 if absent), embedding_offset (u32, 0 if absent).
  • FST code index: inputx_fsa::Fsa mapping code bytes → entry_index (first hit; multi-reading entries follow as a run).
  • FST word index: reverse, word → entry_index, for L0 / blacklist joins.

§Reader / writer

Re-exports§

pub use codec::EngineKind;
pub use codec::EntryFlags;
pub use codec::Header;
pub use codec::Version;
pub use codec::MAGIC;
pub use codec::HEADER_SIZE;
pub use codec::ENTRY_SIZE;
pub use reader::Entry;
pub use reader::IdfReader;
pub use writer::IdfBuilder;

Modules§

codec
IDFv1 binary layout — header / entry / flag types + LE codec helpers.
reader
IdfReader — mmap zero-copy reader for IDFv1 files.
writer
IdfBuilder — deterministic writer for IDFv1 files.