Expand description
IDFv1 binary layout — header / entry / flag types + LE codec helpers.
Wire layout per .claude/PLAN-dict-format-IDFv1.md. Little-endian
throughout (x86_64 + arm64 are both LE; cross-platform safe).
Alignment-strict: section boundaries are 8-byte aligned; Entry is
exactly 16 bytes packed.
Structs§
- Entry
Flags - Per-entry flag bits. Bit assignments are stable across IDFv1.
- Entry
Record - Per-entry record (16 bytes packed).
word_offsetandcode_offsetare u24 (3 bytes); they point into the string pool.log_prioris signed Q4 fixed-point (one log unit per 16 integer steps, perinputx_scoring::Q4).raw_freqis the original pre-quantization corpus frequency (added v1.4.7 sub-phase A4 step 1) — it lets cement-side cement rebuild a lossless tiebreaker when two entries land in the same Q4log_priorbucket (e.g. 乎/护 for codehu, both quantize to Q4=170; raw_freq distinguishes them). - Header
- Header, mirrors the on-disk 64-byte layout exactly. All integer
fields are little-endian on disk; the public struct holds host-byte
values (decoded on
parse, encoded onto_bytes).
Enums§
- Engine
Kind - Which engine the dict serves. Stable u8 across versions so dispatch code can match without translation.
- Version
- Format version. v1 = the layout described in
.claude/PLAN-dict-format-IDFv1.md. Future v2+ will live alongside via theformat_versionheader byte; v1 readers MUST reject unknown versions with a clear error.
Constants§
- ENTRY_
SIZE - Fixed per-entry size in bytes.
- FULL_
HEADER_ SIZE - Total on-disk header region (header + sha256 area). Sections begin here.
- HEADER_
SIZE - Fixed header size in bytes. Sections begin at
HEADER_SIZE. - MAGIC
- Magic bytes at file offset 0. ASCII
"IDFv". - SHA256_
SIZE - Reserved byte length for the sha256 region immediately after the
64-byte header proper. The full on-disk header region is
HEADER_SIZE + SHA256_SIZE = 96bytes; sections begin at offset 96.
Functions§
- decode_
match_ type - Decode
EntryRecord::match_typeback toinputx_scoring::MatchType. Inline payload fields are zeroed; callers attach runtime values. - encode_
match_ type - Encode an
inputx_scoring::MatchTypeinto the single u8 stored inEntryRecord::match_type. Round-trippable viadecode_match_type. Inline payload (proximity / fuzzy cost / bigram_links) is lost on encode — the writer usesExactfor entries that have a fixed dict-baseline classification; runtime paths attach the inline payload based on how the buffer matched.