Expand description
Layer taxonomy for dictionary entries.
Every (code, word) belongs to exactly one Layer determined at build
time. The layer feeds two purposes:
- Coarse ordering —
LAYER_BASEgives each layer a numeric base weight so 一级简码 always outranks any 二级简码, etc., regardless of per-entry frequency. - User-tunable preference —
WubiDict::set_layer_preflets the host multiply a layer’s contribution at lookup time without touching data.
FST values pack (layer << 56) | freq_score so the runtime can read
both in one stream pass. freq_score is currently always 0 (placeholder
until the corpus pipeline lands); when populated it’ll be the
corpus-derived frequency normalized within layer.
Enums§
- Layer
- Discriminants are ascending priority:
Auto = 0is lowest,Jianma1is highest. This makes the FST’s packed(layer << 56) | freqcompare correctly with rawu64ordering — higher u64 = higher priority — so the build-time merge step can keep the larger value on collision without special casing.
Constants§
- DEFAULT_
LAYER_ PREFS - Default
layer_prefs, indexed byLayer as usize.Autois dampened to 0.7 so extension characters don’t pollute the top of common 4-letter codes; everything else is 1.0. - LAYER_
BASE - Per-layer base weight, indexed by
Layer as usize(ascending). Values are spaced so that any in-layer frequency score (capped well below the gap) cannot reorder layers, but a sufficientlayer_prefmultiplier can. - LAYER_
COUNT - Total number of layers. Acts as the array length for
LAYER_BASE,DEFAULT_LAYER_PREFS, and any per-layer table the host might keep.
Functions§
- pack
- Pack
(layer, freq_score)into a single u64 FST value.freq_scoremust fit in 56 bits; higher bits are silently truncated. - unpack
- Reverse of
pack. Unknown layer bytes fall back toLayer::Auto(lowest priority) — preferable to panicking on a corrupt FST.