Expand description
ANALYZE TABLE — collect column statistics from a batch.
Computes the column statistics the CBO needs:
row_countnull_countper columnmin_value/max_value(stringified for cross-type safety)distinct_countper column (HyperLogLog-style approximation, or exact when the input is small)
The driver calls
analyze_batch over a single RecordBatch or
analyze_record_batches for an aggregate
over many batches (e.g. every file behind a table). The result is a
ColumnStatistics ready to attach to
TableMetadata via
with_stats.
Constants§
- EXACT_
NDV_ CAP - Approximate NDV cap above which we drop to a HyperLogLog-style estimate.
Functions§
- analyze_
batch - Compute column statistics from a single
RecordBatch. - analyze_
batch_ per_ column - Compute per-column statistics for every column in
batch. - analyze_
record_ batches - Compute column statistics from an iterator of
RecordBatches.