Skip to main content

RowSamplingMeasure

Struct RowSamplingMeasure 

Source
pub struct RowSamplingMeasure { /* private fields */ }
Expand description

A per-row sampling measure over n rows, normalized to sum to 1.

Built from a RowMetric via RowSamplingMeasure::from_metric. The weights are a proper probability measure (non-negative, finite, summing to 1) used for discovery/seeding oversampling only — see the module docs for the invariant that it touches no loss / gradient / criterion.

Implementations§

Source§

impl RowSamplingMeasure

Source

pub fn from_metric(metric: &RowMetric) -> Self

Build the enrichment measure from a RowMetric.

The per-row liveness is the Fisher mass tr(M_n) read from the metric’s validated PSD blocks. The result is normalized to a proper sampling measure. Degrades to the uniform measure (every row 1/n) when the metric is Euclidean, carries no usable mass (all rows ≤ 0), or yields any non-finite mass — never an error, mirroring RowMetric’s magic-by-default discipline.

This function reads only the metric’s geometry; it writes nothing into the metric, the loss, the gradient, or any criterion.

Source

pub fn uniform(n: usize) -> Self

The uniform measure over n rows: every row weight 1 / n. The graceful fallback and the explicit “no behavioral harvest” measure.

Source

pub fn from_masses( metric_provenance: MetricProvenance, masses: Vec<f64>, ) -> Self

Construct from raw per-row masses, normalizing to a proper measure. Falls back to uniform if the masses carry no usable signal.

Crate-visible so the two-tier harvest (gam_inference::harvest) can lift designed-subsample Fisher masses to a full-corpus measure through the same validation/normalization path.

Source

pub fn weights(&self) -> &[f64]

The normalized per-row sampling weights (Σ == 1). Read-only; this is a sampling measure, never a loss weight.

Source

pub fn provenance(&self) -> MeasureProvenance

The measure’s provenance — Uniform (graceful fallback / no harvest) or FisherMass (real behavioral enrichment).

Source

pub fn n_rows(&self) -> usize

Number of rows the measure is defined over.

Source

pub fn is_enriched(&self) -> bool

Whether this measure actually enriches (is non-uniform Fisher-mass). false for the uniform fallback.

Source

pub fn enrichment_order(&self, count: usize, seed: u64) -> Vec<usize>

Deterministic systematic-resampling enrichment ordering.

Returns a length-count vector of row indices drawn ∝ weights, using low-variance systematic resampling with a fixed, index-derived jitter — there is no clock randomness; the same (measure, count, seed) always yields the same ordering. Behaviorally-live rows therefore appear with multiplicity proportional to their Fisher mass, so a rare-but-live feature’s rows are oversampled relative to uniform.

Systematic resampling places count equally spaced pointers (j + u) / count, j = 0..count, against the cumulative weight CDF and emits the row each pointer lands in. The single offset u ∈ [0, 1) is a splitmix64-hash of seed (deterministic), giving an unbiased draw whose per-row expected count is count · weights[row] while guaranteeing every weight-≥ 1/count row appears at least once (the recall property the rare-feature control asserts).

The uniform fallback reproduces an even, deterministic round-robin over all rows — i.e. plain attention to every row, today’s behavior.

This ordering is consumed only by a discovery/seeding pass. The rows it names carry their ordinary, unmodified per-row objective.

Source

pub fn expected_representation(&self, count: usize) -> Vec<f64>

Expected number of times each row is drawn in a count-sized enrichment batch: count · weights[row]. A diagnostic for the discovery-recall control — it lets a test assert that a rare-but-live feature’s rows have markedly higher expected representation under enrichment than under uniform, with no sampling noise.

Source

pub fn designed_subsample(&self, budget: usize, seed: u64) -> DesignedRowSample

Draw a designed subsample with honest inclusion weights — the frontier estimator of #987 (mechanizing the #973 subsample-honesty contract for measure-driven designs).

This is a different animal from Self::enrichment_order, and the distinction is load-bearing:

  • Enrichment orders rows for discovery/seeding attention; each visited row keeps its ordinary, unweighted per-row objective. The measure never touches the loss.
  • A designed subsample replaces the full corpus as what the fit sums over. That is only sound if every selected row’s loss term is multiplied by 1 / π_i (its inclusion probability), so that the subsampled criterion is unbiased for the full-corpus criterion: E[Σ_{i ∈ S} ℓ_i / π_i] = Σ_i ℓ_i. The returned DesignedRowSample carries exactly those weights; the caller folds them into the likelihood as row weights. These are sampling-design corrections — they are not a Fisher reweighting of residuals (the #980 failure mode), and under the uniform measure they degrade to the constant n / budget, the plain Horvitz–Thompson scale-up.

Design: inclusion probabilities are water-filled as π_i = min(1, τ · w'_i) with τ solved so Σ π_i = budget, where w' is the measure defensively mixed with [DESIGNED_SAMPLE_UNIFORM_MIX] of uniform — the standard defensive-mixture guard that keeps every row’s π_i > 0 (no row’s loss is unreachable, so the estimator stays unbiased) and bounds the largest weight. Selection is Madow systematic sampling against the cumulative π with a single deterministic splitmix64-derived offset — no clock randomness; the same (measure, budget, seed) always yields the same sample. Rows are returned in ascending order (stream-friendly).

budget ≥ n returns every row with weight 1.0 — the exact full pass, bit-for-bit today’s behavior, so a driver can call this unconditionally and let the budget decide.

Source

pub fn designed_subsample_certified<'a, I>( &self, row_factors: I, target_eps: f64, leverage: &[f64], kappa_hat: f64, chart_radius: f64, budget: usize, ) -> Result<CertifiedRowSample, String>
where I: IntoIterator<Item = ArrayView2<'a, f64>>,

Draw a certified designed subsample within a target eps of the full corpus on BOTH evidence halves (#1012).

Unlike Self::designed_subsample — whose Horvitz–Thompson design is unbiased only in expectation — this is the deterministic CERTIFIED mode:

  • spectral half (½log|H|): deterministic Batson–Spielman–Srivastava selection of O(dim/eps²) weighted rows from the per-row factors R_i (H_i = R_iᵀR_i), giving (1−eps)H ⪯ H_C ⪯ (1+eps)H and hence |log|H_C| − log|H|| ≤ dim·log((1+eps)/(1−eps));
  • likelihood half (L): the sensitivity bounds σ_i ≤ leverage_i·(1 + κ̂·chart_radius) on the documented chart ball, greedily selected against the row budget; the residual sensitivity mass is the additive eps_likelihood·L the certificate carries.

The two selections are unioned (a row certified for either half is kept), the rows carry their deterministic BSS / sensitivity weights, and the CoresetCertificate rides the result so a race consumer can gate the transfer with CoresetCertificate::race_transfer_margin — the SAME margin seam the enclosure path (#1011) declares. Below that margin the consumer must grow the coreset, never silently decide.

row_factors is the per-row factor list aligned with this measure’s rows; leverage, kappa_hat, chart_radius are the sensitivity inputs (the #1007 SVD-anchor leverage and the #1008 curvature slack). budget caps the likelihood-half greedy selection.

Trait Implementations§

Source§

impl Clone for RowSamplingMeasure

Source§

fn clone(&self) -> RowSamplingMeasure

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for RowSamplingMeasure

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V