pub struct HuggingfaceDatasetLoader { /* private fields */ }std and dataset only.Expand description
Load a dataset from huggingface datasets.
The dataset with all splits is stored in a single sqlite database (see SqliteDataset).
§Example
use burn_dataset::HuggingfaceDatasetLoader;
use burn_dataset::SqliteDataset;
use serde::{Deserialize, Serialize};
#[derive(Deserialize, Debug, Clone)]
struct MnistItemRaw {
pub image_bytes: Vec<u8>,
pub label: usize,
}
let train_ds:SqliteDataset<MnistItemRaw> = HuggingfaceDatasetLoader::new("mnist")
.dataset("train")
.unwrap();§Note
This loader relies on the datasets library by HuggingFace
to download datasets. This is a Python library, so you must have an existing Python installation.
Implementations§
Source§impl HuggingfaceDatasetLoader
impl HuggingfaceDatasetLoader
Sourcepub fn new(name: &str) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn new(name: &str) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Create a huggingface dataset loader.
Sourcepub fn with_subset(self, subset: &str) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn with_subset(self, subset: &str) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Create a huggingface dataset loader for a subset of the dataset.
The subset name must be one of the subsets listed in the dataset page.
If no subset names are listed, then do not use this method.
Sourcepub fn with_base_dir(self, base_dir: &str) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn with_base_dir(self, base_dir: &str) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Specify a base directory to store the dataset.
If not specified, the dataset will be stored in ~/.cache/burn-dataset.
Sourcepub fn with_huggingface_token(
self,
huggingface_token: &str,
) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn with_huggingface_token( self, huggingface_token: &str, ) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Specify a huggingface token to download datasets behind authentication.
You can get a token from tokens settings
Sourcepub fn with_huggingface_cache_dir(
self,
huggingface_cache_dir: &str,
) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn with_huggingface_cache_dir( self, huggingface_cache_dir: &str, ) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Specify a huggingface cache directory to store the downloaded datasets.
If not specified, the dataset will be stored in ~/.cache/huggingface/datasets.
Sourcepub fn with_huggingface_data_dir(
self,
huggingface_data_dir: &str,
) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn with_huggingface_data_dir( self, huggingface_data_dir: &str, ) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Specify a relative path to a subset of a dataset. This is used in some datasets for the manual steps of dataset download process.
Unless you’ve encountered a ManualDownloadError when loading your dataset you probably don’t have to worry about this setting.
Sourcepub fn with_trust_remote_code(
self,
trust_remote_code: bool,
) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn with_trust_remote_code( self, trust_remote_code: bool, ) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Specify whether or not to trust remote code.
If not specified, trust remote code is set to true.
Sourcepub fn with_use_python_venv(
self,
use_python_venv: bool,
) -> HuggingfaceDatasetLoader
Available on crate features sqlite or sqlite-bundled only.
pub fn with_use_python_venv( self, use_python_venv: bool, ) -> HuggingfaceDatasetLoader
sqlite or sqlite-bundled only.Specify whether or not to use the burn-dataset Python
virtualenv for running the importer script. If false, local
python3’s environment is used.
If not specified, the virtualenv is used.
Sourcepub fn dataset<I>(self, split: &str) -> Result<SqliteDataset<I>, ImporterError>where
I: DeserializeOwned + Clone,
Available on crate features sqlite or sqlite-bundled only.
pub fn dataset<I>(self, split: &str) -> Result<SqliteDataset<I>, ImporterError>where
I: DeserializeOwned + Clone,
sqlite or sqlite-bundled only.Load the dataset.
Sourcepub fn db_file(self) -> Result<PathBuf, ImporterError>
Available on crate features sqlite or sqlite-bundled only.
pub fn db_file(self) -> Result<PathBuf, ImporterError>
sqlite or sqlite-bundled only.Get the path to the sqlite database file.
If the database file does not exist, it will be downloaded and imported.