Skip to main content

IdfReader

Struct IdfReader 

Source
pub struct IdfReader<B>
where B: AsRef<[u8]>,
{ /* private fields */ }
Expand description

IDFv1 reader. Holds the file’s byte region and the parsed header. entry_index lookups are constant-time arithmetic; FST code/word scans use inputx_fsa directly over the index sections.

Implementations§

Source§

impl<B: AsRef<[u8]>> IdfReader<B>

Source

pub fn from_bytes(bytes: B) -> Result<Self, OpenError>

Build a reader from an in-memory byte region. Validates the header and (in std builds) the sha256 of payload.

Source

pub fn header(&self) -> &Header

Source

pub fn version(&self) -> Version

Source

pub fn engine_kind(&self) -> EngineKind

Source

pub fn entry_count(&self) -> u32

Source

pub fn sha256(&self) -> [u8; 32]

Source

pub fn entries(&self) -> impl Iterator<Item = Entry<'_>> + '_

All entries in entry-table order. O(n) — used by prefix_top_k fallback and by tests; production hot paths should go through the FST code / word indexes.

Source

pub fn entry_at(&self, index: u32) -> Option<Entry<'_>>

Entry at the given index. index must be < entry_count.

Source

pub fn lookup<'a>(&'a self, code: &[u8]) -> Vec<Entry<'a>>

Lookup all entries whose code exactly matches the queried bytes. When the file carries an FST code index (v1.4.6 sub- phase C1 onwards), goes through the FST (O(|code|) instead of O(entry_count)) and walks the contiguous multi-reading run in the entry table. Falls back to a linear scan for v1.4.3-era files that ship with an empty FST section.

Source

pub fn find_by_word<'a>(&'a self, word: &str) -> Vec<Entry<'a>>

Reverse lookup by word. Same caveats as lookup.

Source

pub fn prefix_top_k_fst<'a>(&'a self, prefix: &[u8], k: usize) -> Vec<Entry<'a>>

Top-k entries whose code starts with the prefix, ordered by log_prior desc. When the FST code index is populated, walks only the prefix subtree (O(matching codes) instead of O(entry_count)). Falls back to linear scan for v1.4.3-era files.

Source

pub fn prefix_for_each_entry<'a, F: FnMut(Entry<'a>)>( &'a self, prefix: &[u8], visit: F, )

Streaming visit of all entries whose code starts with prefix, FST-indexed (O(matching codes) instead of O(entry_count)). The callback receives each Entry by value (Copy), so callers can build their own ranking / top-k / cement-business-rule re- score over the result stream without paying for an interim Vec allocation or a fixed sort policy. Visit order is the FST’s prefix-walk order (code-asc), with per-code multi-reading entries in entry-table order.

Cement-level use case: the composite pinyin adapter’s push_prefix_top_k and single_letter_cache need raw freq (via estimated_freq_from_log_prior) + word-length bias + proximity factor applied per entry before ranking — a fixed prefix_top_k_fst sort by log_prior desc would either drop would-be winners or force the cement to scan the full entry table to recover what FST already knows.

Falls back to a linear scan + filter on v1.4.3-era files without a populated FST section.

Source

pub fn prefix_top_k<'a>(&'a self, prefix: &[u8], k: usize) -> Vec<Entry<'a>>

Original linear-scan top-k (v1.4.3 fallback). log_prior desc. Linear scan + top-k heap (BinaryHeap for std; sorted insert otherwise).

Source§

impl IdfReader<Mmap>

Source

pub fn open<P: AsRef<Path>>(path: P) -> Result<Self, OpenError>

Open an IDFv1 file at path via mmap. Zero-copy: lookups borrow directly from the mmap region. The reader holds the mmap alive; drop the reader to drop the mapping.

Auto Trait Implementations§

§

impl<B> Freeze for IdfReader<B>
where B: Freeze,

§

impl<B> RefUnwindSafe for IdfReader<B>
where B: RefUnwindSafe,

§

impl<B> Send for IdfReader<B>
where B: Send,

§

impl<B> Sync for IdfReader<B>
where B: Sync,

§

impl<B> Unpin for IdfReader<B>
where B: Unpin,

§

impl<B> UnsafeUnpin for IdfReader<B>
where B: UnsafeUnpin,

§

impl<B> UnwindSafe for IdfReader<B>
where B: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.