1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
//! Group-by aggregation: collapse the subject dimension by aggregating
//! measurements across subjects that share one or more quality values.
//!
//! Group-by is the **subject-dimension** analog of the interval module's
//! **time-dimension** reduction:
//!
//! - `interval` collapses `(subject × time)` → `(subject × bucket)`
//! - `group` collapses `(subject × time)` → `(group × time)` where
//! group is defined by one or more qualities (parish, region, …)
//!
//! They compose cleanly. When both are set on a subset request, the
//! pipeline applies interval first, then group — the statistical
//! interpretation is "aggregate over time per subject, then aggregate
//! across subjects per group".
//!
//! # Module layout
//!
//! - [`planner`] — pure decision logic: given a measurement + request
//! aggregation overrides, produce a [`GroupAggregationPlan`]. Zero
//! polars work, fully unit-testable.
//! - [`aggregate`] — imperative DataFrame side: join qualities, apply
//! missing-value policy, group by qualities + time, emit stats.
//!
//! # Fairness through N
//!
//! Every group-aggregated cell carries `n_subjects_contributing` —
//! the count of subjects in the group that had a non-null value in
//! this `(group, time, measurement)` cell. A group spanning 12 stations
//! that's missing data in one month will show `n_subjects_contributing`
//! decline for that month — making the basis of the aggregate visible
//! for analytics and UI tooltips.
use HashMap;
use ;
use crate::;
pub use ;
pub use ;
// ============================================================================
// GroupBy — request-side specification
// ============================================================================
/// Group subjects by one or more qualities and aggregate measurements
/// across group members.
///
/// The output's subject column carries the group label (concatenation
/// of the grouping qualities' values). Individual subjects disappear
/// into their group; their quality values and the per-group N can be
/// recovered from [`GroupStats`].
/// How to handle subjects whose grouping-quality values are null.