pub struct Dict { /* private fields */ }Expand description
Double-Array Trie dictionary.
Provides O(k) lookup where k is the number of bytes in the query.
Memory layout is two Vec<i32> (base + check), cache-friendly.
Implementations§
Source§impl Dict
impl Dict
Sourcepub fn from_word_list(text: &str) -> Self
pub fn from_word_list(text: &str) -> Self
Build a Dict from a newline-separated word list.
Lines starting with # are treated as comments and skipped.
Blank lines are skipped. Words are stored as raw UTF-8 bytes.
§Example
use kham_core::dict::Dict;
let dict = Dict::from_word_list("กิน\nข้าว\nปลา\n");
assert!(dict.contains("กิน"));
assert!(dict.contains("ข้าว"));
assert!(!dict.contains("xyz"));Sourcepub fn from_bytes(data: &[u8]) -> Self
pub fn from_bytes(data: &[u8]) -> Self
Deserialise a Dict from a pre-compiled DARTS binary blob.
The binary blob is normally produced by build.rs at compile time and
embedded via include_bytes!. Loading from bytes bypasses the full
trie-construction pipeline, making it the fastest way to obtain a
ready-to-use dictionary.
§Binary Format
The blob begins with a fixed 16-byte header followed by the raw array data. All multi-byte integers are little-endian.
Offset Size Field Description
────── ──── ─────────── ───────────────────────────────────────────
0 4 magic ASCII b"KDAM" — identifies the file type
4 1 version Format version; currently 0x01
5 3 reserved Must be zero; reserved for future flags
8 4 base_len Number of i32 elements in the base array
12 4 check_len Number of i32 elements in the check array
16 — base[] base_len × 4 bytes, little-endian i32
16 + base_len*4 — check[] check_len × 4 bytes, little-endian i32§Performance
O(S) where S is the total byte length of data. The function
performs one pass of chunks_exact(4).map(i32::from_le_bytes) over
each array, allocating exactly base_len + check_len i32 values.
Compare with from_word_list, which is O(W × K) — proportional to
total word bytes — due to trie construction and base-offset search.
| Method | Complexity | Allocation | Use when |
|---|---|---|---|
from_bytes | O(S) | 2 × Vec<i32> | loading pre-built binary blob |
from_word_list | O(W × K) | trie + DARTS | building from a raw word list |
§Errors / Panics
This function panics — rather than returning a Result — because
failures indicate a corrupted or stale build artifact, not a runtime
condition the caller can meaningfully recover from. A clean
cargo build always regenerates a valid dict.bin.
| Condition | Panic message |
|---|---|
data.len() < 16 | "dict.bin too short" |
First 4 bytes ≠ b"KDAM" | "dict.bin: bad magic" |
Byte 4 ≠ 0x01 | "dict.bin: unsupported version" |
§Example
use kham_core::dict::Dict;
// Typically you embed a pre-built blob with include_bytes! and pass it here.
// This example constructs a minimal valid blob by hand for illustration.
let dict = kham_core::dict::builtin_dict();
assert!(dict.contains("กิน"));
assert!(dict.contains("ธนาคาร"));For the common case of the built-in word list, prefer builtin_dict(),
which calls this function with the compile-time-embedded blob.
Sourcepub fn contains(&self, word: &str) -> bool
pub fn contains(&self, word: &str) -> bool
Returns true if word is present in the dictionary.
Lookup is O(n) where n is word.len() in bytes.
§Example
use kham_core::dict::Dict;
let dict = Dict::from_word_list("สวัสดี\nโลก\n");
assert!(dict.contains("สวัสดี"));
assert!(dict.contains("โลก"));
assert!(!dict.contains("สวัส")); // prefix only — not a word
assert!(!dict.contains(""));Sourcepub fn prefixes<'t>(&self, text: &'t str) -> Vec<&'t str>
pub fn prefixes<'t>(&self, text: &'t str) -> Vec<&'t str>
Return all substrings of text (anchored at byte 0) that are present
in the dictionary, ordered longest first.
Only returns slices that are valid UTF-8 boundaries (which is always guaranteed when the dictionary contains valid UTF-8 words).
§Example
use kham_core::dict::Dict;
let dict = Dict::from_word_list("กิน\nกิน\nกิน\nกินข้าว\n");
let p = dict.prefixes("กินข้าวกับปลา");
// Longest prefix is "กินข้าว", then "กิน"
assert_eq!(p[0], "กินข้าว");
assert_eq!(p[1], "กิน");Sourcepub fn state_count(&self) -> usize
pub fn state_count(&self) -> usize
Number of allocated DARTS states (informational / testing).