pub struct IdfReader<B>{ /* private fields */ }Expand description
IDFv1 reader. Holds the file’s byte region and the parsed header.
entry_index lookups are constant-time arithmetic; FST code/word
scans use inputx_fsa directly over the index sections.
Implementations§
Source§impl<B: AsRef<[u8]>> IdfReader<B>
impl<B: AsRef<[u8]>> IdfReader<B>
Sourcepub fn from_bytes(bytes: B) -> Result<Self, OpenError>
pub fn from_bytes(bytes: B) -> Result<Self, OpenError>
Build a reader from an in-memory byte region. Validates the header and (in std builds) the sha256 of payload.
pub fn header(&self) -> &Header
pub fn version(&self) -> Version
pub fn engine_kind(&self) -> EngineKind
pub fn entry_count(&self) -> u32
pub fn sha256(&self) -> [u8; 32]
Sourcepub fn entries(&self) -> impl Iterator<Item = Entry<'_>> + '_
pub fn entries(&self) -> impl Iterator<Item = Entry<'_>> + '_
All entries in entry-table order. O(n) — used by prefix_top_k
fallback and by tests; production hot paths should go through
the FST code / word indexes.
Sourcepub fn entry_at(&self, index: u32) -> Option<Entry<'_>>
pub fn entry_at(&self, index: u32) -> Option<Entry<'_>>
Entry at the given index. index must be < entry_count.
Sourcepub fn lookup<'a>(&'a self, code: &[u8]) -> Vec<Entry<'a>>
pub fn lookup<'a>(&'a self, code: &[u8]) -> Vec<Entry<'a>>
Lookup all entries whose code exactly matches the queried
bytes. When the file carries an FST code index (v1.4.6 sub-
phase C1 onwards), goes through the FST (O(|code|) instead of
O(entry_count)) and walks the contiguous multi-reading run in
the entry table. Falls back to a linear scan for v1.4.3-era
files that ship with an empty FST section.
Sourcepub fn find_by_word<'a>(&'a self, word: &str) -> Vec<Entry<'a>>
pub fn find_by_word<'a>(&'a self, word: &str) -> Vec<Entry<'a>>
Reverse lookup by word. Same caveats as lookup.
Sourcepub fn prefix_top_k_fst<'a>(&'a self, prefix: &[u8], k: usize) -> Vec<Entry<'a>>
pub fn prefix_top_k_fst<'a>(&'a self, prefix: &[u8], k: usize) -> Vec<Entry<'a>>
Top-k entries whose code starts with the prefix, ordered by
log_prior desc. When the FST code index is populated, walks
only the prefix subtree (O(matching codes) instead of
O(entry_count)). Falls back to linear scan for v1.4.3-era files.
Sourcepub fn prefix_for_each_entry<'a, F: FnMut(Entry<'a>)>(
&'a self,
prefix: &[u8],
visit: F,
)
pub fn prefix_for_each_entry<'a, F: FnMut(Entry<'a>)>( &'a self, prefix: &[u8], visit: F, )
Streaming visit of all entries whose code starts with prefix,
FST-indexed (O(matching codes) instead of O(entry_count)). The
callback receives each Entry by value (Copy), so callers
can build their own ranking / top-k / cement-business-rule re-
score over the result stream without paying for an interim
Vec allocation or a fixed sort policy. Visit order is the
FST’s prefix-walk order (code-asc), with per-code multi-reading
entries in entry-table order.
Cement-level use case: the composite pinyin adapter’s
push_prefix_top_k and single_letter_cache need raw freq
(via estimated_freq_from_log_prior) + word-length bias +
proximity factor applied per entry before ranking — a fixed
prefix_top_k_fst sort by log_prior desc would either drop
would-be winners or force the cement to scan the full entry
table to recover what FST already knows.
Falls back to a linear scan + filter on v1.4.3-era files without a populated FST section.