ai_dataloader/
lib.rs

1#![cfg_attr(docsrs, feature(doc_cfg))]
2
3//! The `ai-dataloader` crate provides a Rust implementation to the [`PyTorch`] `DataLoader`.
4//!
5//!
6//! Unlike the python version where almost everything happens in runtime, `ai-dataloader` is built on Rust's powerful trait system.
7//!
8//!
9//! ## Highlights
10//!
11//! - Iterable or indexable (Map style) `DataLoader`.
12//! - Customizable `Sampler`, `BatchSampler` and `collate_fn`.
13//! - Integration with [`ndarray`] and [`tch-rs`], CPU and GPU support.
14//! - Default collate function that will automatically collate most of your type (supporting nesting).
15//! - Shuffling for iterable and indexable `DataLoader`.
16//!
17//! ## Examples
18//!
19//! Examples can be found in the [examples] folder.
20//!
21//! ## `PyTorch` `DataLoader` function equivalents
22//!
23//! ### `DataLoader` creation
24//!
25//! `PyTorch` | `ai-dataloader` | Notes
26//! --------|-----------------|-------
27//! `DataLoader(dataset)` | `DataLoader::builder(dataset).build()` | Create a `DataLoader` with default parameters
28//! `DataLoader(dataset, batch_size=2)` | `DataLoader::builder(dataset).batch_size(2).build()` | Setup the batch size
29//! `DataLoader(dataset, shuffle=True)` | `DataLoader::builder(dataset).shuffle().build()` | Shuffle the data
30//! `DataLoader(dataset, sampler=CustomSampler)` | `DataLoader::builder(dataset).sampler::<CustomSampler>().build()` | Provide a custom sampler
31//!
32//! ### Combined options
33//!
34//! `PyTorch` | `ai-dataloader`
35//! --------|-----------------
36//! `DataLoader(dataset, shuffle=True, batch_size=2, drop_last=True, collate_fn=CustomCollate)` | `DataLoaderBuilder::new(dataset).shuffle().batch_size(2).drop_last().collate_fn(CustomCollate).build()`
37//!
38//! ### `DataLoader` iteration
39//!
40//! `PyTorch` | `ai-dataloader` | Notes
41//! --------|-----------------|-------
42//! `for text, label in data_loader:` | `for (text, label) in data_loader.iter()` | Simple iteration
43//!
44//!
45//! ## Choosing between Iterable or Indexable dataloader
46//!
47//! You can choose Iterable `DataLoader` for instance if your dataset arrived from a stream and you don't have random access into it.
48//! It's also useful for large dataset to only load a small part at the time in the RAM. When the order mater, for instance in Reinforcement Learning, Iterable
49//! `DataLoader` is also a good fit.
50//!
51//! Otherwise Indexable Dataloader (Map style in [`PyTorch`] doc) maybe be a good fit.
52//!
53//! Both support shuffling the sample.
54//!
55//! To choose iterable:
56//!
57//! ```
58//! use ai_dataloader::iterable::DataLoader;
59//! ```
60//!
61//! To choose indexable:
62//!
63//! ```
64//! use ai_dataloader::indexable::DataLoader;
65//! ```
66//!
67//! [`tch-rs`]: https://github.com/LaurentMazare/tch-rs
68//! [`PyTorch`]: https://pytorch.org/
69//! [examples]: https://github.com/Tudyx/ai-dataloader/tree/main/examples
70
71pub mod collate;
72pub mod indexable;
73pub mod iterable;
74
75pub use indexable::{sampler, Dataset, GetSample, Len, NdarrayDataset};
76
77#[cfg(feature = "rayon")]
78use once_cell::sync::OnceCell;
79#[cfg(feature = "rayon")]
80use rayon::ThreadPool;
81
82/// Thread pool used by the dataloader.
83#[cfg(feature = "rayon")]
84pub static THREAD_POOL: OnceCell<ThreadPool> = OnceCell::new();