Skip to main content

Module layer

Module layer 

Source
Expand description

Layer taxonomy for dictionary entries.

Every (code, word) belongs to exactly one Layer determined at build time. The layer feeds two purposes:

  1. Coarse orderingLAYER_BASE gives each layer a numeric base weight so 一级简码 always outranks any 二级简码, etc., regardless of per-entry frequency.
  2. User-tunable preferenceWubiDict::set_layer_pref lets the host multiply a layer’s contribution at lookup time without touching data.

FST values pack (layer << 56) | freq_score so the runtime can read both in one stream pass. freq_score is currently always 0 (placeholder until the corpus pipeline lands); when populated it’ll be the corpus-derived frequency normalized within layer.

Enums§

Layer
Discriminants are ascending priority: Auto = 0 is lowest, Jianma1 is highest. This makes the FST’s packed (layer << 56) | freq compare correctly with raw u64 ordering — higher u64 = higher priority — so the build-time merge step can keep the larger value on collision without special casing.

Constants§

DEFAULT_LAYER_PREFS
Default layer_prefs, indexed by Layer as usize. Auto is dampened to 0.7 so extension characters don’t pollute the top of common 4-letter codes; everything else is 1.0.
LAYER_BASE
Per-layer base weight, indexed by Layer as usize (ascending). Values are spaced so that any in-layer frequency score (capped well below the gap) cannot reorder layers, but a sufficient layer_pref multiplier can.
LAYER_COUNT
Total number of layers. Acts as the array length for LAYER_BASE, DEFAULT_LAYER_PREFS, and any per-layer table the host might keep.

Functions§

pack
Pack (layer, freq_score) into a single u64 FST value. freq_score must fit in 56 bits; higher bits are silently truncated.
unpack
Reverse of pack. Unknown layer bytes fall back to Layer::Auto (lowest priority) — preferable to panicking on a corrupt FST.