Skip to main content

Dict

Struct Dict 

Source
pub struct Dict { /* private fields */ }
Expand description

Double-Array Trie dictionary.

Provides O(k) lookup where k is the number of bytes in the query. Memory layout is two Vec<i32> (base + check), cache-friendly.

Implementations§

Source§

impl Dict

Source

pub fn from_word_list(text: &str) -> Self

Build a Dict from a newline-separated word list.

Lines starting with # are treated as comments and skipped. Blank lines are skipped. Words are stored as raw UTF-8 bytes.

§Example
use kham_core::dict::Dict;

let dict = Dict::from_word_list("กิน\nข้าว\nปลา\n");
assert!(dict.contains("กิน"));
assert!(dict.contains("ข้าว"));
assert!(!dict.contains("xyz"));
Source

pub fn from_bytes(data: &[u8]) -> Self

Deserialise a Dict from a pre-compiled DARTS binary blob.

The binary blob is normally produced by build.rs at compile time and embedded via include_bytes!. Loading from bytes bypasses the full trie-construction pipeline, making it the fastest way to obtain a ready-to-use dictionary.

§Binary Format

The blob begins with a fixed 16-byte header followed by the raw array data. All multi-byte integers are little-endian.

Offset  Size  Field        Description
──────  ────  ───────────  ───────────────────────────────────────────
     0     4  magic        ASCII b"KDAM" — identifies the file type
     4     1  version      Format version; currently 0x01
     5     3  reserved     Must be zero; reserved for future flags
     8     4  base_len     Number of i32 elements in the base array
    12     4  check_len    Number of i32 elements in the check array
    16     —  base[]       base_len × 4 bytes, little-endian i32
    16 + base_len*4  —  check[]  check_len × 4 bytes, little-endian i32
§Performance

O(S) where S is the total byte length of data. The function performs one pass of chunks_exact(4).map(i32::from_le_bytes) over each array, allocating exactly base_len + check_len i32 values. Compare with from_word_list, which is O(W × K) — proportional to total word bytes — due to trie construction and base-offset search.

MethodComplexityAllocationUse when
from_bytesO(S)2 × Vec<i32>loading pre-built binary blob
from_word_listO(W × K)trie + DARTSbuilding from a raw word list
§Errors / Panics

This function panics — rather than returning a Result — because failures indicate a corrupted or stale build artifact, not a runtime condition the caller can meaningfully recover from. A clean cargo build always regenerates a valid dict.bin.

ConditionPanic message
data.len() < 16"dict.bin too short"
First 4 bytes ≠ b"KDAM""dict.bin: bad magic"
Byte 4 ≠ 0x01"dict.bin: unsupported version"
§Example
use kham_core::dict::Dict;

// Typically you embed a pre-built blob with include_bytes! and pass it here.
// This example constructs a minimal valid blob by hand for illustration.
let dict = kham_core::dict::builtin_dict();
assert!(dict.contains("กิน"));
assert!(dict.contains("ธนาคาร"));

For the common case of the built-in word list, prefer builtin_dict(), which calls this function with the compile-time-embedded blob.

Source

pub fn contains(&self, word: &str) -> bool

Returns true if word is present in the dictionary.

Lookup is O(n) where n is word.len() in bytes.

§Example
use kham_core::dict::Dict;

let dict = Dict::from_word_list("สวัสดี\nโลก\n");
assert!(dict.contains("สวัสดี"));
assert!(dict.contains("โลก"));
assert!(!dict.contains("สวัส")); // prefix only — not a word
assert!(!dict.contains(""));
Source

pub fn prefixes<'t>(&self, text: &'t str) -> Vec<&'t str>

Return all substrings of text (anchored at byte 0) that are present in the dictionary, ordered longest first.

Only returns slices that are valid UTF-8 boundaries (which is always guaranteed when the dictionary contains valid UTF-8 words).

§Example
use kham_core::dict::Dict;

let dict = Dict::from_word_list("กิน\nกิน\nกิน\nกินข้าว\n");
let p = dict.prefixes("กินข้าวกับปลา");
// Longest prefix is "กินข้าว", then "กิน"
assert_eq!(p[0], "กินข้าว");
assert_eq!(p[1], "กิน");
Source

pub fn state_count(&self) -> usize

Number of allocated DARTS states (informational / testing).

Auto Trait Implementations§

§

impl Freeze for Dict

§

impl RefUnwindSafe for Dict

§

impl Send for Dict

§

impl Sync for Dict

§

impl Unpin for Dict

§

impl UnsafeUnpin for Dict

§

impl UnwindSafe for Dict

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.