pub struct Histogram {
pub buckets: Vec<HistogramBucket>,
pub total_count: i64,
pub distinct_count: usize,
pub null_count: i64,
}Expand description
A histogram representing the distribution of values in a column.
Fields§
§buckets: Vec<HistogramBucket>The buckets in the histogram, ordered by frequency (descending)
total_count: i64Total number of values (including nulls if present)
distinct_count: usizeNumber of distinct values
null_count: i64Number of null values
Implementations§
Source§impl Histogram
impl Histogram
Sourcepub fn new(
buckets: Vec<HistogramBucket>,
total_count: i64,
null_count: i64,
) -> Self
pub fn new( buckets: Vec<HistogramBucket>, total_count: i64, null_count: i64, ) -> Self
Creates a new histogram from buckets.
Sourcepub fn most_common_ratio(&self) -> f64
pub fn most_common_ratio(&self) -> f64
Returns the ratio of the most common value.
Sourcepub fn least_common_ratio(&self) -> f64
pub fn least_common_ratio(&self) -> f64
Returns the ratio of the least common value.
Sourcepub fn bucket_count(&self) -> usize
pub fn bucket_count(&self) -> usize
Returns the number of buckets (distinct values).
Sourcepub fn top_n(&self, n: usize) -> Vec<(&str, f64)>
pub fn top_n(&self, n: usize) -> Vec<(&str, f64)>
Returns the top N most common values and their ratios.
Sourcepub fn is_roughly_uniform(&self, threshold: f64) -> bool
pub fn is_roughly_uniform(&self, threshold: f64) -> bool
Checks if the distribution is roughly uniform (all values have similar frequencies).
A distribution is considered roughly uniform if the ratio between the most common and least common values is less than the threshold (default 1.5).
Sourcepub fn get_value_ratio(&self, value: &str) -> Option<f64>
pub fn get_value_ratio(&self, value: &str) -> Option<f64>
Gets the ratio for a specific value, if it exists in the histogram.
Sourcepub fn entropy(&self) -> f64
pub fn entropy(&self) -> f64
Returns the entropy of the distribution.
Higher entropy indicates more uniform distribution.
Sourcepub fn follows_power_law(&self, top_n: usize, threshold: f64) -> bool
pub fn follows_power_law(&self, top_n: usize, threshold: f64) -> bool
Checks if the distribution follows a power law (few values dominate).
Returns true if the top n values account for more than threshold of the distribution.
Sourcepub fn null_ratio(&self) -> f64
pub fn null_ratio(&self) -> f64
Returns the null ratio in the data.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for Histogram
impl RefUnwindSafe for Histogram
impl Send for Histogram
impl Sync for Histogram
impl Unpin for Histogram
impl UnwindSafe for Histogram
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more