pub struct Corpus {
pub name: String,
pub root_path: PathBuf,
pub images: Vec<CorpusImage>,
pub metadata: CorpusMetadata,
}Expand description
A corpus of test images.
Fields§
§name: StringName of the corpus.
root_path: PathBufRoot path of the corpus.
images: Vec<CorpusImage>Images in the corpus.
metadata: CorpusMetadataMetadata about the corpus.
Implementations§
Source§impl Corpus
impl Corpus
Sourcepub fn new(name: impl Into<String>, root_path: impl Into<PathBuf>) -> Self
pub fn new(name: impl Into<String>, root_path: impl Into<PathBuf>) -> Self
Create a new empty corpus.
Sourcepub fn discover(path: impl AsRef<Path>) -> Result<Self>
pub fn discover(path: impl AsRef<Path>) -> Result<Self>
Discover images in a directory.
Recursively scans the directory for supported image formats (PNG, JPEG, WebP, AVIF).
Sourcepub fn get_dataset(dataset: &str) -> Result<Self>
pub fn get_dataset(dataset: &str) -> Result<Self>
Get corpus dataset, downloading if necessary.
When the corpus feature is enabled (default), uses the codec-corpus crate
for automatic download and caching. Otherwise, checks if the path exists locally.
§Arguments
path- Dataset path (e.g., “kodak”, “clic2025/training”)
§Example
// With corpus feature (default): downloads and caches automatically
let corpus = Corpus::get_dataset("kodak")?;
// Discovers images in the cached directory
println!("Found {} images", corpus.len());Sourcepub fn discover_or_download(
path: impl AsRef<Path>,
_url: Option<&str>,
_subsets: Option<&[&str]>,
) -> Result<Self>
pub fn discover_or_download( path: impl AsRef<Path>, _url: Option<&str>, _subsets: Option<&[&str]>, ) -> Result<Self>
Discover or download a corpus on demand (legacy, requires local path).
If the path exists, discovers images. Otherwise returns an error.
Use get_dataset() when the corpus feature is enabled for automatic downloads.
§Arguments
path- Local path for the corpus_url- Ignored (for backward compatibility)_subsets- Ignored (for backward compatibility)
Sourcepub fn download_dataset(dataset: &str) -> Result<Self>
pub fn download_dataset(dataset: &str) -> Result<Self>
Sourcepub fn get_or_download(preferred_path: impl AsRef<Path>) -> Result<Self>
pub fn get_or_download(preferred_path: impl AsRef<Path>) -> Result<Self>
Get corpus from local paths (legacy method).
Checks common locations for existing corpus:
- The specified path
- ./codec-corpus
- ../codec-corpus
- ../codec-comparison/codec-corpus
When the corpus feature is enabled, use get_dataset() instead
for automatic download and caching.
Sourcepub fn save(&self, path: impl AsRef<Path>) -> Result<()>
pub fn save(&self, path: impl AsRef<Path>) -> Result<()>
Save the corpus to a JSON manifest file.
Sourcepub fn filter_category(&self, category: ImageCategory) -> Vec<&CorpusImage>
pub fn filter_category(&self, category: ImageCategory) -> Vec<&CorpusImage>
Filter images by category.
Sourcepub fn filter_format(&self, format: &str) -> Vec<&CorpusImage>
pub fn filter_format(&self, format: &str) -> Vec<&CorpusImage>
Filter images by format.
Sourcepub fn filter_min_size(
&self,
min_width: u32,
min_height: u32,
) -> Vec<&CorpusImage>
pub fn filter_min_size( &self, min_width: u32, min_height: u32, ) -> Vec<&CorpusImage>
Filter images by minimum dimensions.
Sourcepub fn split(&self, train_ratio: f64) -> (Vec<&CorpusImage>, Vec<&CorpusImage>)
pub fn split(&self, train_ratio: f64) -> (Vec<&CorpusImage>, Vec<&CorpusImage>)
Split the corpus into training and validation sets.
Uses a deterministic split based on checksum to ensure reproducibility.
§Arguments
train_ratio- Fraction of images to include in training set (0.0-1.0).
Sourcepub fn compute_checksums(&mut self) -> Result<usize>
pub fn compute_checksums(&mut self) -> Result<usize>
Compute checksums for all images that don’t have them.
Sourcepub fn find_duplicates(&self) -> Vec<Vec<&CorpusImage>>
pub fn find_duplicates(&self) -> Vec<Vec<&CorpusImage>>
Find duplicate images by checksum.
Sourcepub fn update_category_counts(&mut self)
pub fn update_category_counts(&mut self)
Update category counts in metadata.
Sourcepub fn stats(&self) -> CorpusStats
pub fn stats(&self) -> CorpusStats
Get statistics about the corpus.
Trait Implementations§
Source§impl<'de> Deserialize<'de> for Corpus
impl<'de> Deserialize<'de> for Corpus
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Auto Trait Implementations§
impl Freeze for Corpus
impl RefUnwindSafe for Corpus
impl Send for Corpus
impl Sync for Corpus
impl Unpin for Corpus
impl UnwindSafe for Corpus
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more