pub struct WeightedDataLoader<D: Dataset> { /* private fields */ }Expand description
A data loader that samples with per-sample weights.
Unlike DataLoader which samples uniformly,
WeightedDataLoader samples proportional to the provided weights.
This is useful for:
- Importance sampling in imbalanced datasets
- CITL reweighting (
--reweight 1.5for compiler-verified labels) - Curriculum learning with difficulty-based sampling
§Example
use alimentar::{ArrowDataset, Dataset, WeightedDataLoader};
let dataset = ArrowDataset::from_parquet("data.parquet").unwrap();
let weights = vec![1.0; dataset.len()]; // Uniform weights
let loader = WeightedDataLoader::new(dataset, weights)
.unwrap()
.batch_size(32)
.seed(42);
for batch in loader {
println!("Batch with {} rows", batch.num_rows());
}Implementations§
Source§impl<D: Dataset> WeightedDataLoader<D>
impl<D: Dataset> WeightedDataLoader<D>
Sourcepub fn with_reweight(dataset: D, reweight: f32) -> Result<Self>
pub fn with_reweight(dataset: D, reweight: f32) -> Result<Self>
Creates a weighted loader with a uniform reweight factor.
Multiplies all weights by the given factor. Useful for CITL’s
--reweight 1.5 which boosts compiler-verified samples.
§Arguments
dataset- The dataset to sample fromreweight- Factor to multiply all weights by
Sourcepub fn batch_size(self, size: usize) -> Self
pub fn batch_size(self, size: usize) -> Self
Sets the batch size.
Sourcepub fn num_samples(self, n: usize) -> Self
pub fn num_samples(self, n: usize) -> Self
Sets the total number of samples per epoch.
By default, samples len() items per epoch. Set this to oversample
or undersample the dataset.
Sourcepub fn drop_last(self, drop_last: bool) -> Self
pub fn drop_last(self, drop_last: bool) -> Self
Sets whether to drop the last incomplete batch.
Sourcepub fn get_batch_size(&self) -> usize
pub fn get_batch_size(&self) -> usize
Returns the configured batch size.
Sourcepub fn get_num_samples(&self) -> usize
pub fn get_num_samples(&self) -> usize
Returns the number of samples per epoch.
Sourcepub fn num_batches(&self) -> usize
pub fn num_batches(&self) -> usize
Returns the number of batches that will be yielded.
Trait Implementations§
Source§impl<D: Dataset> IntoIterator for WeightedDataLoader<D>
Available on crate feature shuffle only.
impl<D: Dataset> IntoIterator for WeightedDataLoader<D>
Available on crate feature
shuffle only.Auto Trait Implementations§
impl<D> Freeze for WeightedDataLoader<D>
impl<D> RefUnwindSafe for WeightedDataLoader<D>where
D: RefUnwindSafe,
impl<D> Send for WeightedDataLoader<D>
impl<D> Sync for WeightedDataLoader<D>
impl<D> Unpin for WeightedDataLoader<D>
impl<D> UnsafeUnpin for WeightedDataLoader<D>
impl<D> UnwindSafe for WeightedDataLoader<D>where
D: RefUnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreCreates a shared type from an unshared type.