Structs§
- Analysis
Context - Analysis
Results - Categorical
Statistics - Column
Statistics - Column
Stats - Compute
Options - Correlation
Matrix - Correlation
Pair - Distribution
Analysis - Distribution
Characteristics - Distribution
Info - Numeric
Statistics - Outlier
Analysis - Outlier
Row - Percentile
Breakdown
Enums§
Constants§
- SAMPLING_
THRESHOLD - Default sampling threshold: datasets >= this size are sampled. Used as fallback when sample_size is None. App uses config value.
Functions§
- analysis_
results_ from_ describe - Builds describe-only AnalysisResults from a list of column statistics.
- calculate_
fit_ quality - Calculates fit quality (p-value) for a given distribution type.
- calculate_
theoretical_ bin_ probabilities - Calculates probabilities for each bin defined by bin_boundaries.
- calculate_
theoretical_ probability_ in_ interval - Calculates the probability that a value falls in [lower, upper] for the given distribution.
- collect_
lazy - Collects a LazyFrame into a DataFrame.
- compute_
correlation_ matrix - Computes pairwise Pearson correlation matrix for all numeric columns.
- compute_
correlation_ pair - Computes correlation statistics for a pair of columns.
- compute_
correlation_ statistics - Computes correlation matrix if not already present in results.
- compute_
describe_ column - Computes describe statistics for a single column of an already-collected DataFrame.
- compute_
describe_ from_ lazy - Computes describe statistics from a LazyFrame without materializing all rows. When sampling is disabled, runs a single aggregation collect (like Polars describe) for similar performance. When sampling is enabled, samples then runs describe on the sample.
- compute_
describe_ single_ aggregation - Computes describe statistics in a single aggregation pass over the DataFrame. Uses one collect() with aggregated expressions for all columns (count, null_count, mean, std, min, percentiles, max).
- compute_
distribution_ statistics - Computes distribution statistics for numeric columns.
- compute_
statistics - Computes statistics for a LazyFrame with default options.
- compute_
statistics_ with_ options - Computes comprehensive statistics for a LazyFrame.
- sample_
dataframe - Samples a LazyFrame for analysis when row count exceeds threshold. Used by chunked describe.