TableStatistics

Struct TableStatistics 

Source
pub struct TableStatistics {
    pub row_count: usize,
    pub columns: HashMap<String, ColumnStatistics>,
    pub last_updated: SystemTime,
    pub is_stale: bool,
    pub sample_metadata: Option<SampleMetadata>,
    pub avg_row_bytes: Option<f64>,
}
Expand description

Statistics for an entire table

Fields§

§row_count: usize

Total number of rows

§columns: HashMap<String, ColumnStatistics>

Per-column statistics

§last_updated: SystemTime

Timestamp when stats were last updated

§is_stale: bool

Whether stats are stale (need recomputation)

§sample_metadata: Option<SampleMetadata>

Sampling metadata (Phase 5.2) None if no sampling was used (small table)

§avg_row_bytes: Option<f64>

Average row size in bytes (computed from sampled data)

This provides actual row size measurements that account for:

  • Real string/varchar fill ratios (not heuristic estimates)
  • Actual NULL prevalence
  • True BLOB/CLOB sizes

Used by DML cost estimation to scale WAL write costs. None if statistics were estimated from schema (no actual data sampled).

Implementations§

Source§

impl TableStatistics

Source

pub fn estimate_from_schema(row_count: usize, schema: &TableSchema) -> Self

Create estimated statistics with basic column estimates

This method provides reasonable defaults for column statistics without requiring a full ANALYZE scan. It uses data type information to generate basic statistics using conservative heuristics.

§Heuristics Used
  • Boolean columns: n_distinct = 2
  • Integer/Smallint/Bigint/Unsigned columns: n_distinct = sqrt(row_count) (conservative)
  • Float/Real/DoublePrecision columns: n_distinct = sqrt(row_count) to 100 (high cardinality)
  • Varchar/Character/Name columns: n_distinct = row_count * 0.5 (assume moderate uniqueness)
  • Date/Timestamp/Time columns: n_distinct = row_count * 0.8 (high cardinality)
  • Numeric/Decimal columns: n_distinct = sqrt(row_count) (moderate)
  • Nullable columns: null_count ≈ row_count * 0.01 (1% estimated nulls)
  • Non-nullable columns: null_count = 0
  • All columns: is_stale = true (clearly marked as estimates)
§Arguments
  • row_count - Total number of rows in the table
  • schema - Table schema with column definitions
§Example
let stats = TableStatistics::estimate_from_schema(5000, &schema);
// Boolean col: n_distinct = 2
// Integer col: n_distinct = sqrt(5000) ≈ 70
// Varchar col: n_distinct = 2500
// All columns: is_stale = true
Source

pub fn compute(rows: &[Row], schema: &TableSchema) -> Self

Compute statistics by scanning the table

Source

pub fn compute_with_config( rows: &[Row], schema: &TableSchema, sampling_config: Option<SamplingConfig>, enable_histograms: bool, histogram_buckets: usize, bucket_strategy: BucketStrategy, ) -> Self

Compute statistics with sampling (Phase 5.2) and histogram support (Phase 5.1)

§Arguments
  • rows - All table rows
  • schema - Table schema
  • sampling_config - Optional sampling configuration (None = adaptive)
  • enable_histograms - Whether to build histograms
  • histogram_buckets - Number of histogram buckets
  • bucket_strategy - Histogram bucketing strategy
Source

pub fn compute_sampled(rows: &[Row], schema: &TableSchema) -> Self

Compute statistics using adaptive sampling (Phase 5.2 convenience method)

This automatically:

  • Uses full scan for small tables (< 1000 rows)
  • Uses 10% sample for medium tables (1K-100K rows)
  • Uses fixed 10K sample for large tables (> 100K rows)

Compute statistics with both sampling and histograms enabled

Source

pub fn estimate_from_row_count(row_count: usize) -> Self

Create estimated statistics from table metadata without full ANALYZE

This provides a fallback for cost estimation when detailed statistics aren’t available (i.e., ANALYZE hasn’t been run). It uses the table’s row count and provides conservative defaults for other fields.

§Use Cases
  • DML cost estimation when ANALYZE hasn’t been run
  • Quick cost comparisons before detailed statistics are available
§Limitations
  • No per-column statistics (empty columns map)
  • No histogram data
  • Marked as stale to indicate these are estimates
§Example
let table_stats = table.get_statistics()
    .cloned()
    .unwrap_or_else(|| TableStatistics::estimate_from_row_count(table.row_count()));
Source

pub fn mark_stale(&mut self)

Mark statistics as stale after significant data changes

Source

pub fn needs_refresh(&self) -> bool

Check if statistics should be recomputed

Returns true if stats are marked stale or too old

Trait Implementations§

Source§

impl Clone for TableStatistics

Source§

fn clone(&self) -> TableStatistics

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TableStatistics

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V