pub struct RowSamplingMeasure { /* private fields */ }Expand description
A per-row sampling measure over n rows, normalized to sum to 1.
Built from a RowMetric via RowSamplingMeasure::from_metric. The weights are a
proper probability measure (non-negative, finite, summing to 1) used for
discovery/seeding oversampling only — see the module docs for the
invariant that it touches no loss / gradient / criterion.
Implementations§
Source§impl RowSamplingMeasure
impl RowSamplingMeasure
Sourcepub fn from_metric(metric: &RowMetric) -> Self
pub fn from_metric(metric: &RowMetric) -> Self
Build the enrichment measure from a RowMetric.
The per-row liveness is the Fisher mass tr(M_n) read from the metric’s
validated PSD blocks. The result is normalized to a proper sampling
measure. Degrades to the uniform measure (every row 1/n) when the
metric is Euclidean, carries no usable mass (all rows ≤ 0), or yields any
non-finite mass — never an error, mirroring RowMetric’s
magic-by-default discipline.
This function reads only the metric’s geometry; it writes nothing into the metric, the loss, the gradient, or any criterion.
Sourcepub fn uniform(n: usize) -> Self
pub fn uniform(n: usize) -> Self
The uniform measure over n rows: every row weight 1 / n. The graceful
fallback and the explicit “no behavioral harvest” measure.
Sourcepub fn from_masses(
metric_provenance: MetricProvenance,
masses: Vec<f64>,
) -> Self
pub fn from_masses( metric_provenance: MetricProvenance, masses: Vec<f64>, ) -> Self
Construct from raw per-row masses, normalizing to a proper measure. Falls back to uniform if the masses carry no usable signal.
Crate-visible so the two-tier harvest (gam_inference::harvest)
can lift designed-subsample Fisher masses to a full-corpus measure
through the same validation/normalization path.
Sourcepub fn weights(&self) -> &[f64]
pub fn weights(&self) -> &[f64]
The normalized per-row sampling weights (Σ == 1). Read-only; this is a
sampling measure, never a loss weight.
Sourcepub fn provenance(&self) -> MeasureProvenance
pub fn provenance(&self) -> MeasureProvenance
The measure’s provenance — Uniform (graceful fallback / no harvest) or
FisherMass (real behavioral enrichment).
Sourcepub fn is_enriched(&self) -> bool
pub fn is_enriched(&self) -> bool
Whether this measure actually enriches (is non-uniform Fisher-mass).
false for the uniform fallback.
Sourcepub fn enrichment_order(&self, count: usize, seed: u64) -> Vec<usize>
pub fn enrichment_order(&self, count: usize, seed: u64) -> Vec<usize>
Deterministic systematic-resampling enrichment ordering.
Returns a length-count vector of row indices drawn ∝ weights, using
low-variance systematic resampling with a fixed, index-derived jitter —
there is no clock randomness; the same (measure, count, seed)
always yields the same ordering. Behaviorally-live rows therefore appear
with multiplicity proportional to their Fisher mass, so a rare-but-live
feature’s rows are oversampled relative to uniform.
Systematic resampling places count equally spaced pointers
(j + u) / count, j = 0..count, against the cumulative weight CDF and
emits the row each pointer lands in. The single offset u ∈ [0, 1) is a
splitmix64-hash of seed (deterministic), giving an unbiased draw
whose per-row expected count is count · weights[row] while guaranteeing
every weight-≥ 1/count row appears at least once (the recall property
the rare-feature control asserts).
The uniform fallback reproduces an even, deterministic round-robin over all rows — i.e. plain attention to every row, today’s behavior.
This ordering is consumed only by a discovery/seeding pass. The rows it names carry their ordinary, unmodified per-row objective.
Sourcepub fn expected_representation(&self, count: usize) -> Vec<f64>
pub fn expected_representation(&self, count: usize) -> Vec<f64>
Expected number of times each row is drawn in a count-sized enrichment
batch: count · weights[row]. A diagnostic for the discovery-recall
control — it lets a test assert that a rare-but-live feature’s rows have
markedly higher expected representation under enrichment than under
uniform, with no sampling noise.
Sourcepub fn designed_subsample(&self, budget: usize, seed: u64) -> DesignedRowSample
pub fn designed_subsample(&self, budget: usize, seed: u64) -> DesignedRowSample
Draw a designed subsample with honest inclusion weights — the frontier estimator of #987 (mechanizing the #973 subsample-honesty contract for measure-driven designs).
This is a different animal from Self::enrichment_order, and the
distinction is load-bearing:
- Enrichment orders rows for discovery/seeding attention; each visited row keeps its ordinary, unweighted per-row objective. The measure never touches the loss.
- A designed subsample replaces the full corpus as what the fit
sums over. That is only sound if every selected row’s loss term is
multiplied by
1 / π_i(its inclusion probability), so that the subsampled criterion is unbiased for the full-corpus criterion:E[Σ_{i ∈ S} ℓ_i / π_i] = Σ_i ℓ_i. The returnedDesignedRowSamplecarries exactly those weights; the caller folds them into the likelihood as row weights. These are sampling-design corrections — they are not a Fisher reweighting of residuals (the #980 failure mode), and under the uniform measure they degrade to the constantn / budget, the plain Horvitz–Thompson scale-up.
Design: inclusion probabilities are water-filled as
π_i = min(1, τ · w'_i) with τ solved so Σ π_i = budget, where
w' is the measure defensively mixed with
[DESIGNED_SAMPLE_UNIFORM_MIX] of uniform — the standard
defensive-mixture guard that keeps every row’s π_i > 0 (no row’s loss
is unreachable, so the estimator stays unbiased) and bounds the largest
weight. Selection is Madow systematic sampling against the cumulative
π with a single deterministic splitmix64-derived offset — no clock
randomness; the same (measure, budget, seed) always yields the same
sample. Rows are returned in ascending order (stream-friendly).
budget ≥ n returns every row with weight 1.0 — the exact full pass,
bit-for-bit today’s behavior, so a driver can call this unconditionally
and let the budget decide.
Sourcepub fn designed_subsample_certified<'a, I>(
&self,
row_factors: I,
target_eps: f64,
leverage: &[f64],
kappa_hat: f64,
chart_radius: f64,
budget: usize,
) -> Result<CertifiedRowSample, String>
pub fn designed_subsample_certified<'a, I>( &self, row_factors: I, target_eps: f64, leverage: &[f64], kappa_hat: f64, chart_radius: f64, budget: usize, ) -> Result<CertifiedRowSample, String>
Draw a certified designed subsample within a target eps of the full
corpus on BOTH evidence halves (#1012).
Unlike Self::designed_subsample — whose Horvitz–Thompson design is
unbiased only in expectation — this is the deterministic CERTIFIED mode:
- spectral half (
½log|H|): deterministic Batson–Spielman–Srivastava selection ofO(dim/eps²)weighted rows from the per-row factorsR_i(H_i = R_iᵀR_i), giving(1−eps)H ⪯ H_C ⪯ (1+eps)Hand hence|log|H_C| − log|H|| ≤ dim·log((1+eps)/(1−eps)); - likelihood half (
L): the sensitivity boundsσ_i ≤ leverage_i·(1 + κ̂·chart_radius)on the documented chart ball, greedily selected against the row budget; the residual sensitivity mass is the additiveeps_likelihood·Lthe certificate carries.
The two selections are unioned (a row certified for either half is kept),
the rows carry their deterministic BSS / sensitivity weights, and the
CoresetCertificate rides the result so a race consumer can gate the
transfer with CoresetCertificate::race_transfer_margin — the SAME
margin seam the enclosure path (#1011) declares. Below that margin the
consumer must grow the coreset, never silently decide.
row_factors is the per-row factor list aligned with this measure’s rows;
leverage, kappa_hat, chart_radius are the sensitivity inputs (the
#1007 SVD-anchor leverage and the #1008 curvature slack). budget caps
the likelihood-half greedy selection.
Trait Implementations§
Source§impl Clone for RowSamplingMeasure
impl Clone for RowSamplingMeasure
Source§fn clone(&self) -> RowSamplingMeasure
fn clone(&self) -> RowSamplingMeasure
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for RowSamplingMeasure
impl RefUnwindSafe for RowSamplingMeasure
impl Send for RowSamplingMeasure
impl Sync for RowSamplingMeasure
impl Unpin for RowSamplingMeasure
impl UnsafeUnpin for RowSamplingMeasure
impl UnwindSafe for RowSamplingMeasure
Blanket Implementations§
impl<T> Allocation for T
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T, U> Imply<T> for U
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
impl<T> Read<Exclusive, BecauseExclusive> for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.