pub struct TableStatistics {
pub row_count: usize,
pub columns: HashMap<String, ColumnStatistics>,
pub last_updated: SystemTime,
pub is_stale: bool,
pub sample_metadata: Option<SampleMetadata>,
pub avg_row_bytes: Option<f64>,
}Expand description
Statistics for an entire table
Fields§
§row_count: usizeTotal number of rows
columns: HashMap<String, ColumnStatistics>Per-column statistics
last_updated: SystemTimeTimestamp when stats were last updated
is_stale: boolWhether stats are stale (need recomputation)
sample_metadata: Option<SampleMetadata>Sampling metadata (Phase 5.2) None if no sampling was used (small table)
avg_row_bytes: Option<f64>Average row size in bytes (computed from sampled data)
This provides actual row size measurements that account for:
- Real string/varchar fill ratios (not heuristic estimates)
- Actual NULL prevalence
- True BLOB/CLOB sizes
Used by DML cost estimation to scale WAL write costs. None if statistics were estimated from schema (no actual data sampled).
Implementations§
Source§impl TableStatistics
impl TableStatistics
Sourcepub fn estimate_from_schema(row_count: usize, schema: &TableSchema) -> Self
pub fn estimate_from_schema(row_count: usize, schema: &TableSchema) -> Self
Create estimated statistics with basic column estimates
This method provides reasonable defaults for column statistics without requiring a full ANALYZE scan. It uses data type information to generate basic statistics using conservative heuristics.
§Heuristics Used
- Boolean columns: n_distinct = 2
- Integer/Smallint/Bigint/Unsigned columns: n_distinct = sqrt(row_count) (conservative)
- Float/Real/DoublePrecision columns: n_distinct = sqrt(row_count) to 100 (high cardinality)
- Varchar/Character/Name columns: n_distinct = row_count * 0.5 (assume moderate uniqueness)
- Date/Timestamp/Time columns: n_distinct = row_count * 0.8 (high cardinality)
- Numeric/Decimal columns: n_distinct = sqrt(row_count) (moderate)
- Nullable columns: null_count ≈ row_count * 0.01 (1% estimated nulls)
- Non-nullable columns: null_count = 0
- All columns: is_stale = true (clearly marked as estimates)
§Arguments
row_count- Total number of rows in the tableschema- Table schema with column definitions
§Example
let stats = TableStatistics::estimate_from_schema(5000, &schema);
// Boolean col: n_distinct = 2
// Integer col: n_distinct = sqrt(5000) ≈ 70
// Varchar col: n_distinct = 2500
// All columns: is_stale = trueSourcepub fn compute(rows: &[Row], schema: &TableSchema) -> Self
pub fn compute(rows: &[Row], schema: &TableSchema) -> Self
Compute statistics by scanning the table
Sourcepub fn compute_with_config(
rows: &[Row],
schema: &TableSchema,
sampling_config: Option<SamplingConfig>,
enable_histograms: bool,
histogram_buckets: usize,
bucket_strategy: BucketStrategy,
) -> Self
pub fn compute_with_config( rows: &[Row], schema: &TableSchema, sampling_config: Option<SamplingConfig>, enable_histograms: bool, histogram_buckets: usize, bucket_strategy: BucketStrategy, ) -> Self
Compute statistics with sampling (Phase 5.2) and histogram support (Phase 5.1)
§Arguments
rows- All table rowsschema- Table schemasampling_config- Optional sampling configuration (None = adaptive)enable_histograms- Whether to build histogramshistogram_buckets- Number of histogram bucketsbucket_strategy- Histogram bucketing strategy
Sourcepub fn compute_sampled(rows: &[Row], schema: &TableSchema) -> Self
pub fn compute_sampled(rows: &[Row], schema: &TableSchema) -> Self
Compute statistics using adaptive sampling (Phase 5.2 convenience method)
This automatically:
- Uses full scan for small tables (< 1000 rows)
- Uses 10% sample for medium tables (1K-100K rows)
- Uses fixed 10K sample for large tables (> 100K rows)
Sourcepub fn compute_full_featured(rows: &[Row], schema: &TableSchema) -> Self
pub fn compute_full_featured(rows: &[Row], schema: &TableSchema) -> Self
Compute statistics with both sampling and histograms enabled
Sourcepub fn estimate_from_row_count(row_count: usize) -> Self
pub fn estimate_from_row_count(row_count: usize) -> Self
Create estimated statistics from table metadata without full ANALYZE
This provides a fallback for cost estimation when detailed statistics aren’t available (i.e., ANALYZE hasn’t been run). It uses the table’s row count and provides conservative defaults for other fields.
§Use Cases
- DML cost estimation when ANALYZE hasn’t been run
- Quick cost comparisons before detailed statistics are available
§Limitations
- No per-column statistics (empty columns map)
- No histogram data
- Marked as stale to indicate these are estimates
§Example
let table_stats = table.get_statistics()
.cloned()
.unwrap_or_else(|| TableStatistics::estimate_from_row_count(table.row_count()));Sourcepub fn mark_stale(&mut self)
pub fn mark_stale(&mut self)
Mark statistics as stale after significant data changes
Sourcepub fn needs_refresh(&self) -> bool
pub fn needs_refresh(&self) -> bool
Check if statistics should be recomputed
Returns true if stats are marked stale or too old
Trait Implementations§
Source§impl Clone for TableStatistics
impl Clone for TableStatistics
Source§fn clone(&self) -> TableStatistics
fn clone(&self) -> TableStatistics
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more