pub struct EntropyAnalyzer { /* private fields */ }Expand description
Analyzer that computes Shannon entropy and related information theory metrics.
Entropy measures the average information content or uncertainty in a dataset. Higher entropy indicates more randomness/diversity, while lower entropy indicates more predictability/uniformity.
§Metrics Computed
- Shannon Entropy: -Σ(p_i * log2(p_i)) where p_i is the probability of each value
- Normalized Entropy: Entropy divided by log2(n) where n is the number of unique values
- Gini Impurity: 1 - Σ(p_i²), another measure of diversity
- Effective Number of Values: 2^entropy, interpretable as the effective cardinality
§Example
ⓘ
use term_guard::analyzers::advanced::EntropyAnalyzer;
use datafusion::prelude::*;
let ctx = SessionContext::new();
// Register your data table
let analyzer = EntropyAnalyzer::new("category");
let state = analyzer.compute_state_from_data(&ctx).await?;
let metric = analyzer.compute_metric_from_state(&state)?;
if let MetricValue::Map(metrics) = metric {
println!("Category entropy: {:?} bits", metrics.get("entropy"));
println!("Normalized entropy: {:?}", metrics.get("normalized_entropy"));
println!("Effective categories: {:?}", metrics.get("effective_values"));
}Implementations§
Source§impl EntropyAnalyzer
impl EntropyAnalyzer
Sourcepub fn new(column: impl Into<String>) -> Self
pub fn new(column: impl Into<String>) -> Self
Creates a new entropy analyzer for the specified column.
Sourcepub fn with_max_unique_values(
column: impl Into<String>,
max_unique_values: usize,
) -> Self
pub fn with_max_unique_values( column: impl Into<String>, max_unique_values: usize, ) -> Self
Creates a new entropy analyzer with a custom maximum unique values limit.
Sourcepub fn max_unique_values(&self) -> usize
pub fn max_unique_values(&self) -> usize
Returns the maximum unique values limit.
Trait Implementations§
Source§impl Analyzer for EntropyAnalyzer
impl Analyzer for EntropyAnalyzer
Source§type State = EntropyState
type State = EntropyState
The state type for incremental computation.
Source§type Metric = MetricValue
type Metric = MetricValue
The metric type produced by this analyzer.
Source§fn compute_state_from_data<'life0, 'life1, 'async_trait>(
&'life0 self,
ctx: &'life1 SessionContext,
) -> Pin<Box<dyn Future<Output = AnalyzerResult<Self::State>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn compute_state_from_data<'life0, 'life1, 'async_trait>(
&'life0 self,
ctx: &'life1 SessionContext,
) -> Pin<Box<dyn Future<Output = AnalyzerResult<Self::State>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Computes the state from the input data. Read more
Source§fn compute_metric_from_state(
&self,
state: &Self::State,
) -> AnalyzerResult<Self::Metric>
fn compute_metric_from_state( &self, state: &Self::State, ) -> AnalyzerResult<Self::Metric>
Computes the final metric from the accumulated state. Read more
Source§fn description(&self) -> &str
fn description(&self) -> &str
Returns a description of what this analyzer computes. Read more
Source§fn columns(&self) -> Vec<&str>
fn columns(&self) -> Vec<&str>
Returns the column(s) this analyzer operates on, if any. Read more
Source§fn merge_states(&self, states: Vec<Self::State>) -> AnalyzerResult<Self::State>
fn merge_states(&self, states: Vec<Self::State>) -> AnalyzerResult<Self::State>
Merges multiple states into a single state. Read more
Source§fn metric_key(&self) -> String
fn metric_key(&self) -> String
Returns the metric key for storing results. Read more
Source§fn is_combinable(&self) -> bool
fn is_combinable(&self) -> bool
Indicates whether this analyzer can be combined with others. Read more
Source§impl Clone for EntropyAnalyzer
impl Clone for EntropyAnalyzer
Source§fn clone(&self) -> EntropyAnalyzer
fn clone(&self) -> EntropyAnalyzer
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl Freeze for EntropyAnalyzer
impl RefUnwindSafe for EntropyAnalyzer
impl Send for EntropyAnalyzer
impl Sync for EntropyAnalyzer
impl Unpin for EntropyAnalyzer
impl UnwindSafe for EntropyAnalyzer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more