[](https://github.com/Tudyx/ai-dataloader/actions/workflows/ci.yml)
[](https://crates.io/crates/ai-dataloader)
[](https://docs.rs/ai-dataloader/)
# ai-dataloader
A rust port of [`pytorch`](https://pytorch.org/) `dataloader` library.
> Note: This project is still heavily in development and is at an early stage.
## Highlights
- Iterable or indexable (Map style) `DataLoader`.
- Customizable `Sampler`, `BatchSampler` and `collate_fn`.
- Parallel dataloader using [`rayon`] for indexable dataloader (experimental).
- Integration with [`ndarray`](https://docs.rs/ndarray/latest/ndarray/) and [`tch-rs`](https://github.com/LaurentMazare/tch-rs), CPU and GPU support.
- Default collate function that will automatically collate most of your type (supporting nesting).
- Shuffling for iterable and indexable `DataLoader`.
More info in the [documentation](https://docs.rs/ai-dataloader/).
## Examples
Examples can be found in the [examples](examples/) folder but here there is a simple one
```rust
use ai_dataloader::DataLoader;
let loader = DataLoader::builder(vec![(0, "hola"), (1, "hello"), (2, "hallo"), (3, "bonjour")]).batch_size(2).shuffle().build();
for (label, text) in &loader {
println!("Label {label:?}");
println!("Text {text:?}");
}
```
## [`tch-rs`](https://github.com/LaurentMazare/tch-rs) integration
In order to collate your data into torch tensor that can run on the GPU, you must activate the `tch` feature.
This feature relies on the tch crate for bindings to the C++ `libTorch` API. The `libtorch` library is required can be downloaded either automatically or manually. The following provides a reference on how to set up your environment to use these bindings, please refer to the [tch](https://github.com/LaurentMazare/tch-rs) for detailed information or support.
### Next Features
This features could be added in the future:
- `RandomSampler` with replacement
- parallel `dataloader` for iterable dataset
- distributed `dataloader`
### MSRV
The current MSRV is 1.60.
[`rayon`]: https://docs.rs/rayon/latest/rayon/