axonml-data
Overview
axonml-data provides data-loading infrastructure for training neural networks in the AxonML framework. It includes the Dataset trait, a DataLoader with rayon-backed parallel sample collection, a GPU prefetch iterator that overlaps host loading with device compute, sampling strategies, composable data transforms, and collate utilities.
Features
- Dataset trait —
TensorDataset(caches flat data for O(row_size) access),MapDataset,ConcatDataset,SubsetDataset(withrandom_split), andInMemoryDataset<T>for arbitrary cloneable items. - DataLoader — batched iteration with
shuffle,drop_last, andnum_workers(rayon-parallel sample collection per batch whennum_workers > 0). - GPU prefetch —
DataLoader::prefetch_to_gpu(device)returns aGpuPrefetchIterthat streams batches from a background thread through a bounded channel (2 batches buffered) so CPU loading overlaps with GPU compute. - Samplers —
SequentialSampler,RandomSampler(with/without replacement),SubsetRandomSampler,WeightedRandomSampler(O(log n) per sample via cumulative-sum binary search, swap-remove without replacement), andBatchSampler. - Transforms —
Compose,ToTensor,Normalize(scalar, per-channel, ImageNet preset),RandomNoise(Box-Muller Gaussian),RandomCrop(1D/2D/3D/4D),RandomFlip(generic N-d flip along any dim),Scale,Clamp,Flatten,Reshape,DropoutTransform(train/eval aware),Lambda. - Collate —
DefaultCollateandStackCollate(withwith_dimfor stacking along any axis),GenericDataLoaderfor arbitraryDataset+Collatepairings, plusstack_tensorsandconcat_tensorshelpers.
Modules
| Module | Description |
|---|---|
dataset |
Dataset trait, TensorDataset, MapDataset, ConcatDataset, SubsetDataset, InMemoryDataset |
dataloader |
DataLoader, DataLoaderIter, Batch, GpuPrefetchIter, GenericDataLoader, GenericDataLoaderIter |
sampler |
Sampler trait, SequentialSampler, RandomSampler, SubsetRandomSampler, WeightedRandomSampler, BatchSampler |
transforms |
Transform trait, Compose, ToTensor, Normalize, RandomNoise, RandomCrop, RandomFlip, Scale, Clamp, Flatten, Reshape, DropoutTransform, Lambda |
collate |
Collate trait, DefaultCollate, StackCollate, stack_tensors, concat_tensors |
Usage
Add to your Cargo.toml:
[]
= "0.6.1"
Creating a Dataset
use *;
let x = from_vec.unwrap;
let y = from_vec.unwrap;
let dataset = new;
assert_eq!;
let = dataset.get.unwrap;
Using the DataLoader
use ;
let dataset = new;
let loader = new
.shuffle
.drop_last
.num_workers; // rayon-parallel sample collection per batch
for batch in loader.iter
GPU Prefetch
use Device;
use DataLoader;
let loader = new.shuffle.num_workers;
// Background thread produces batches and transfers to GPU;
// bounded to 2 batches in flight.
for batch in loader.prefetch_to_gpu
Implementing Custom Datasets
use Dataset;
use Tensor;
Data Transforms
use ;
let transform = empty
.add // per-channel ImageNet stats
.add
.add;
let output = transform.apply;
Using Samplers
use ;
let sampler = new;
for idx in sampler.iter
// Weighted sampling for class-imbalanced datasets (O(log n) per sample)
let weights = vec!;
let sampler = new;
let base_sampler = new;
let batch_sampler = new;
for batch_indices in batch_sampler.iter
Dataset Splitting
use ;
let dataset = new;
// Shuffled random split (requires Dataset: Clone)
let splits = random_split;
let train_dataset = &splits;
let val_dataset = &splits;
Combining Datasets
use ;
let combined = new;
let mapped = new;
Generic DataLoader
Flexible loader that works with any Dataset<Item = T> and any Collate<T>:
use ;
let loader = new
.shuffle
.num_workers;
for batch in loader.iter
Tests
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Last updated: 2026-04-16 (v0.6.1)