1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
/*-------------------------------------------------------------------------
*
* pg_statistic.h
* definition of the "statistics" system catalog (pg_statistic)
*
*
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
* src/include/catalog/pg_statistic.h
*
* NOTES
* The Catalog.pm module reads this file and derives schema
* information.
*
*-------------------------------------------------------------------------
*/
/* ----------------
* pg_statistic definition. cpp turns this into
* typedef struct FormData_pg_statistic
* ----------------
*/
FormData_pg_statistic;
/* ----------------
* Form_pg_statistic corresponds to a pointer to a tuple with
* the format of pg_statistic relation.
* ----------------
*/
typedef FormData_pg_statistic *Form_pg_statistic;
/*
* Several statistical slot "kinds" are defined by core PostgreSQL, as
* documented below. Also, custom data types can define their own "kind"
* codes by mutual agreement between a custom typanalyze routine and the
* selectivity estimation functions of the type's operators.
*
* Code reading the pg_statistic relation should not assume that a particular
* data "kind" will appear in any particular slot. Instead, search the
* stakind fields to see if the desired data is available. (The standard
* function get_attstatsslot() may be used for this.)
*/
/*
* The present allocation of "kind" codes is:
*
* 1-99: reserved for assignment by the core PostgreSQL project
* (values in this range will be documented in this file)
* 100-199: reserved for assignment by the PostGIS project
* (values to be documented in PostGIS documentation)
* 200-299: reserved for assignment by the ESRI ST_Geometry project
* (values to be documented in ESRI ST_Geometry documentation)
* 300-9999: reserved for future public assignments
*
* For private use you may choose a "kind" code at random in the range
* 10000-30000. However, for code that is to be widely disseminated it is
* better to obtain a publicly defined "kind" code by request from the
* PostgreSQL Global Development Group.
*/
/*
* In a "most common values" slot, staop is the OID of the "=" operator
* used to decide whether values are the same or not, and stacoll is the
* collation used (same as column's collation). stavalues contains
* the K most common non-null values appearing in the column, and stanumbers
* contains their frequencies (fractions of total row count). The values
* shall be ordered in decreasing frequency. Note that since the arrays are
* variable-size, K may be chosen by the statistics collector. Values should
* not appear in MCV unless they have been observed to occur more than once;
* a unique column will have no MCV slot.
*/
/*
* A "histogram" slot describes the distribution of scalar data. staop is
* the OID of the "<" operator that describes the sort ordering, and stacoll
* is the relevant collation. (In theory more than one histogram could appear,
* if a datatype has more than one useful sort operator or we care about more
* than one collation. Currently the collation will always be that of the
* underlying column.) stavalues contains M (>=2) non-null values that
* divide the non-null column data values into M-1 bins of approximately equal
* population. The first stavalues item is the MIN and the last is the MAX.
* stanumbers is not used and should be NULL. IMPORTANT POINT: if an MCV
* slot is also provided, then the histogram describes the data distribution
* *after removing the values listed in MCV* (thus, it's a "compressed
* histogram" in the technical parlance). This allows a more accurate
* representation of the distribution of a column with some very-common
* values. In a column with only a few distinct values, it's possible that
* the MCV list describes the entire data population; in this case the
* histogram reduces to empty and should be omitted.
*/
/*
* A "correlation" slot describes the correlation between the physical order
* of table tuples and the ordering of data values of this column, as seen
* by the "<" operator identified by staop with the collation identified by
* stacoll. (As with the histogram, more than one entry could theoretically
* appear.) stavalues is not used and should be NULL. stanumbers contains
* a single entry, the correlation coefficient between the sequence of data
* values and the sequence of their actual tuple positions. The coefficient
* ranges from +1 to -1.
*/
/*
* A "most common elements" slot is similar to a "most common values" slot,
* except that it stores the most common non-null *elements* of the column
* values. This is useful when the column datatype is an array or some other
* type with identifiable elements (for instance, tsvector). staop contains
* the equality operator appropriate to the element type, and stacoll
* contains the collation to use with it. stavalues contains
* the most common element values, and stanumbers their frequencies. Unlike
* MCV slots, frequencies are measured as the fraction of non-null rows the
* element value appears in, not the frequency of all rows. Also unlike
* MCV slots, the values are sorted into the element type's default order
* (to support binary search for a particular value). Since this puts the
* minimum and maximum frequencies at unpredictable spots in stanumbers,
* there are two extra members of stanumbers, holding copies of the minimum
* and maximum frequencies. Optionally, there can be a third extra member,
* which holds the frequency of null elements (expressed in the same terms:
* the fraction of non-null rows that contain at least one null element). If
* this member is omitted, the column is presumed to contain no null elements.
*
* Note: in current usage for tsvector columns, the stavalues elements are of
* type text, even though their representation within tsvector is not
* exactly text.
*/
/*
* A "distinct elements count histogram" slot describes the distribution of
* the number of distinct element values present in each row of an array-type
* column. Only non-null rows are considered, and only non-null elements.
* staop contains the equality operator appropriate to the element type,
* and stacoll contains the collation to use with it.
* stavalues is not used and should be NULL. The last member of stanumbers is
* the average count of distinct element values over all non-null rows. The
* preceding M (>=2) members form a histogram that divides the population of
* distinct-elements counts into M-1 bins of approximately equal population.
* The first of these is the minimum observed count, and the last the maximum.
*/
/*
* A "length histogram" slot describes the distribution of range lengths in
* rows of a range-type column. stanumbers contains a single entry, the
* fraction of empty ranges. stavalues is a histogram of non-empty lengths, in
* a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range
* values that divide the column data values into M-1 bins of approximately
* equal population. The lengths are stored as float8s, as measured by the
* range type's subdiff function. Only non-null rows are considered.
*/
/*
* A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for
* a range-type column. stavalues contains M (>=2) range values that divide
* the column data values into M-1 bins of approximately equal population.
* Unlike a regular scalar histogram, this is actually two histograms combined
* into a single array, with the lower bounds of each value forming a
* histogram of lower bounds, and the upper bounds a histogram of upper
* bounds. Only non-NULL, non-empty ranges are included.
*/
/* EXPOSE_TO_CLIENT_CODE */
/* PG_STATISTIC_H */