Skip to main content

DataLoader

Struct DataLoader 

Source
pub struct DataLoader<D: Dataset> { /* private fields */ }
Expand description

A data loader that provides batched iteration over a dataset.

The DataLoader wraps a dataset and provides:

  • Configurable batch sizes
  • Optional shuffling with reproducible seeds
  • Option to drop incomplete final batches

§Example

use alimentar::{ArrowDataset, DataLoader};

let dataset = ArrowDataset::from_parquet("data.parquet").unwrap();
let loader = DataLoader::new(dataset)
    .batch_size(32)
    .shuffle(true)
    .seed(42);

for batch in loader {
    println!("Processing batch with {} rows", batch.num_rows());
}

Implementations§

Source§

impl<D: Dataset> DataLoader<D>

Source

pub fn new(dataset: D) -> Self

Creates a new DataLoader wrapping the given dataset.

Default configuration:

  • batch_size: 1
  • shuffle: false
  • drop_last: false
  • seed: None (random)
Source

pub fn batch_size(self, size: usize) -> Self

Sets the batch size.

Each iteration will yield a RecordBatch with at most this many rows. The final batch may have fewer rows unless drop_last is set.

#[requires(true)] #[ensures(result.batch_size >= 1)] #[ensures(size >= 1 ==> result.batch_size == size)] #[ensures(size == 0 ==> result.batch_size == 1)]

Source

pub fn shuffle(self, shuffle: bool) -> Self

Enables or disables shuffling.

When enabled, the row order is randomized before each epoch. Requires the shuffle feature.

Source

pub fn drop_last(self, drop_last: bool) -> Self

Sets whether to drop the last incomplete batch.

When true, if the dataset size is not evenly divisible by the batch size, the final partial batch is skipped.

Source

pub fn seed(self, seed: u64) -> Self

Sets the random seed for shuffling.

Setting a seed makes shuffling deterministic and reproducible. Requires the shuffle feature.

Source

pub fn get_batch_size(&self) -> usize

Returns the configured batch size.

Source

pub fn is_shuffle(&self) -> bool

Returns whether shuffling is enabled.

Source

pub fn is_drop_last(&self) -> bool

Returns whether drop_last is enabled.

Source

pub fn num_batches(&self) -> usize

Returns the number of batches that will be yielded.

Source

pub fn len(&self) -> usize

Returns the total number of rows in the underlying dataset.

Source

pub fn is_empty(&self) -> bool

Returns true if the dataset is empty.

Trait Implementations§

Source§

impl<D: Debug + Dataset> Debug for DataLoader<D>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<D: Dataset> IntoIterator for DataLoader<D>

Source§

type Item = RecordBatch

The type of the elements being iterated over.
Source§

type IntoIter = DataLoaderIterator<D>

Which kind of iterator are we turning this into?
Source§

fn into_iter(self) -> Self::IntoIter

Creates an iterator from a value. Read more

Auto Trait Implementations§

§

impl<D> Freeze for DataLoader<D>

§

impl<D> RefUnwindSafe for DataLoader<D>
where D: RefUnwindSafe,

§

impl<D> Send for DataLoader<D>

§

impl<D> Sync for DataLoader<D>

§

impl<D> Unpin for DataLoader<D>

§

impl<D> UnsafeUnpin for DataLoader<D>

§

impl<D> UnwindSafe for DataLoader<D>
where D: RefUnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<Unshared, Shared> IntoShared<Shared> for Unshared
where Shared: FromUnshared<Unshared>,

Source§

fn into_shared(self) -> Shared

Creates a shared type from an unshared type.
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,