Expand description

The ai-dataloader crate provides a Rust implementation to the PyTorch DataLoader.

Unlike the python version where almost everything happens in runtime, ai-dataloader is built on Rust’s powerful trait system.

Highlights

  • Shuffle or Sequential Sampler.
  • Customizable sampler
  • Default collate function that covers most of the type of the standard library, supporting nested type.
  • Customizable collate function

Examples

Examples can be found in the examples folder.

PyTorch DataLoader function equivalents

DataLoader creation

PyTorchai-dataloaderNotes
DataLoader(dataset)DataLoader::builder(dataset).build()Create a DataLoader with default parameters
DataLoader(dataset, batch_size=2)DataLoader::builder(dataset).batch_size(2).build()Setup the batch size
DataLoader(dataset, shuffle=True)DataLoader::builder(dataset).shuffle().build()Shuffle the data
DataLoader(dataset, sampler=CustomSampler)DataLoader::builder(dataset).sampler::<CustomSampler>().build()Provide a custom sampler

Combined options

PyTorchai-dataloader
DataLoader(dataset, shuffle=True, batch_size=2, drop_last=True, collate_fn=CustomCollate)DataLoaderBuilder::new(dataset).shuffle().batch_size(2).drop_last().collate_fn(CustomCollate).build()

DataLoader iteration

PyTorchai-dataloaderNotes
for text, label in data_loader:for (text, label) in data_loader.iter()Simple iteration

Modules

Merges a list of samples to form a batch.
Defines the strategy to draw samples from the dataset.

Structs

Basic builder for creating dataloader.
Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.

Traits

A dataset is just something that has a length and is indexable. A Vec of dataset collate output must also be collatable.
Return a sample from the dataset at a given index.
Basic trait for anything that could have a length. Even if a lot of struct have a len() method in the standard library, to my knowledge this function is not included into any standard trait.