pub struct IdfStats {
pub doc_count: u64,
pub total_tokens: u64,
pub term_df: HashMap<String, u64>,
}Expand description
Corpus-level IDF statistics accumulated from all ingested documents.
Serialized via bincode + zstd and stored as a file alongside the Iceberg metadata.
Fields§
§doc_count: u64Number of documents (rows) seen.
total_tokens: u64Sum of all document token lengths (used for avg_doc_len).
term_df: HashMap<String, u64>Document frequency: number of documents containing each term.
Implementations§
Source§impl IdfStats
impl IdfStats
pub fn avg_doc_len(&self) -> f32
Sourcepub fn idf(&self, term: &str) -> f32
pub fn idf(&self, term: &str) -> f32
BM25+ IDF: always positive, avoids negative values for terms appearing in >50% of docs.
Sourcepub fn merge_batch(&mut self, texts: &[&str])
pub fn merge_batch(&mut self, texts: &[&str])
Merge DF counts from a new batch of text documents into this stats object.
Each &str is one document. Prunes to MAX_VOCAB by dropping lowest-DF
terms after merge (keeps highest-DF terms which are most useful for BM25 normalization).
Sourcepub fn to_bytes(&self) -> AilakeResult<Vec<u8>>
pub fn to_bytes(&self) -> AilakeResult<Vec<u8>>
Serialize to zstd-compressed bincode bytes.
Sourcepub fn from_bytes(bytes: &[u8]) -> AilakeResult<Self>
pub fn from_bytes(bytes: &[u8]) -> AilakeResult<Self>
Deserialize from zstd-compressed bincode bytes.
Trait Implementations§
Source§impl<'de> Deserialize<'de> for IdfStats
impl<'de> Deserialize<'de> for IdfStats
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
Auto Trait Implementations§
impl Freeze for IdfStats
impl RefUnwindSafe for IdfStats
impl Send for IdfStats
impl Sync for IdfStats
impl Unpin for IdfStats
impl UnsafeUnpin for IdfStats
impl UnwindSafe for IdfStats
Blanket Implementations§
impl<T> Allocation for T
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> DeserializeOwned for Twhere
T: for<'de> Deserialize<'de>,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more