pub struct DataLoader<D: Dataset> { /* private fields */ }Expand description
A data loader that provides batched iteration over a dataset.
The DataLoader wraps a dataset and provides:
- Configurable batch sizes
- Optional shuffling with reproducible seeds
- Option to drop incomplete final batches
§Example
use alimentar::{ArrowDataset, DataLoader};
let dataset = ArrowDataset::from_parquet("data.parquet").unwrap();
let loader = DataLoader::new(dataset)
.batch_size(32)
.shuffle(true)
.seed(42);
for batch in loader {
println!("Processing batch with {} rows", batch.num_rows());
}Implementations§
Source§impl<D: Dataset> DataLoader<D>
impl<D: Dataset> DataLoader<D>
Sourcepub fn new(dataset: D) -> Self
pub fn new(dataset: D) -> Self
Creates a new DataLoader wrapping the given dataset.
Default configuration:
- batch_size: 1
- shuffle: false
- drop_last: false
- seed: None (random)
Sourcepub fn batch_size(self, size: usize) -> Self
pub fn batch_size(self, size: usize) -> Self
Sets the batch size.
Each iteration will yield a RecordBatch with at most this many rows.
The final batch may have fewer rows unless drop_last is set.
#[requires(true)] #[ensures(result.batch_size >= 1)] #[ensures(size >= 1 ==> result.batch_size == size)] #[ensures(size == 0 ==> result.batch_size == 1)]
Sourcepub fn shuffle(self, shuffle: bool) -> Self
pub fn shuffle(self, shuffle: bool) -> Self
Enables or disables shuffling.
When enabled, the row order is randomized before each epoch.
Requires the shuffle feature.
Sourcepub fn drop_last(self, drop_last: bool) -> Self
pub fn drop_last(self, drop_last: bool) -> Self
Sets whether to drop the last incomplete batch.
When true, if the dataset size is not evenly divisible by the batch size, the final partial batch is skipped.
Sourcepub fn seed(self, seed: u64) -> Self
pub fn seed(self, seed: u64) -> Self
Sets the random seed for shuffling.
Setting a seed makes shuffling deterministic and reproducible.
Requires the shuffle feature.
Sourcepub fn get_batch_size(&self) -> usize
pub fn get_batch_size(&self) -> usize
Returns the configured batch size.
Sourcepub fn is_shuffle(&self) -> bool
pub fn is_shuffle(&self) -> bool
Returns whether shuffling is enabled.
Sourcepub fn is_drop_last(&self) -> bool
pub fn is_drop_last(&self) -> bool
Returns whether drop_last is enabled.
Sourcepub fn num_batches(&self) -> usize
pub fn num_batches(&self) -> usize
Returns the number of batches that will be yielded.
Trait Implementations§
Source§impl<D: Dataset> IntoIterator for DataLoader<D>
impl<D: Dataset> IntoIterator for DataLoader<D>
Auto Trait Implementations§
impl<D> Freeze for DataLoader<D>
impl<D> RefUnwindSafe for DataLoader<D>where
D: RefUnwindSafe,
impl<D> Send for DataLoader<D>
impl<D> Sync for DataLoader<D>
impl<D> Unpin for DataLoader<D>
impl<D> UnsafeUnpin for DataLoader<D>
impl<D> UnwindSafe for DataLoader<D>where
D: RefUnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more