Skip to main content

Module analyze

Module analyze 

Source
Expand description

ANALYZE TABLE — collect column statistics from a batch.

Computes the column statistics the CBO needs:

  • row_count
  • null_count per column
  • min_value / max_value (stringified for cross-type safety)
  • distinct_count per column (HyperLogLog-style approximation, or exact when the input is small)

The driver calls analyze_batch over a single RecordBatch or analyze_record_batches for an aggregate over many batches (e.g. every file behind a table). The result is a ColumnStatistics ready to attach to TableMetadata via with_stats.

Constants§

EXACT_NDV_CAP
Approximate NDV cap above which we drop to a HyperLogLog-style estimate.

Functions§

analyze_batch
Compute column statistics from a single RecordBatch.
analyze_batch_per_column
Compute per-column statistics for every column in batch.
analyze_record_batches
Compute column statistics from an iterator of RecordBatches.