Skip to main content

Corpus

Struct Corpus 

Source
pub struct Corpus {
    pub name: String,
    pub root_path: PathBuf,
    pub images: Vec<CorpusImage>,
    pub metadata: CorpusMetadata,
}
Expand description

A corpus of test images.

Fields§

§name: String

Name of the corpus.

§root_path: PathBuf

Root path of the corpus.

§images: Vec<CorpusImage>

Images in the corpus.

§metadata: CorpusMetadata

Metadata about the corpus.

Implementations§

Source§

impl Corpus

Source

pub fn new(name: impl Into<String>, root_path: impl Into<PathBuf>) -> Self

Create a new empty corpus.

Source

pub fn discover(path: impl AsRef<Path>) -> Result<Self>

Discover images in a directory.

Recursively scans the directory for supported image formats (PNG, JPEG, WebP, AVIF).

Source

pub fn get_dataset(dataset: &str) -> Result<Self>

Get corpus dataset, downloading if necessary.

When the corpus feature is enabled (default), uses the codec-corpus crate for automatic download and caching. Otherwise, checks if the path exists locally.

§Arguments
  • path - Dataset path (e.g., “kodak”, “clic2025/training”)
§Example
// With corpus feature (default): downloads and caches automatically
let corpus = Corpus::get_dataset("kodak")?;

// Discovers images in the cached directory
println!("Found {} images", corpus.len());
Source

pub fn discover_or_download( path: impl AsRef<Path>, _url: Option<&str>, _subsets: Option<&[&str]>, ) -> Result<Self>

Discover or download a corpus on demand (legacy, requires local path).

If the path exists, discovers images. Otherwise returns an error. Use get_dataset() when the corpus feature is enabled for automatic downloads.

§Arguments
  • path - Local path for the corpus
  • _url - Ignored (for backward compatibility)
  • _subsets - Ignored (for backward compatibility)
Source

pub fn download_dataset(dataset: &str) -> Result<Self>

Download a specific dataset (replaces download_subset).

With the corpus feature enabled, this uses codec-corpus for caching.

§Example
let corpus = Corpus::download_dataset("kodak")?;
Source

pub fn get_or_download(preferred_path: impl AsRef<Path>) -> Result<Self>

Get corpus from local paths (legacy method).

Checks common locations for existing corpus:

  1. The specified path
  2. ./codec-corpus
  3. ../codec-corpus
  4. ../codec-comparison/codec-corpus

When the corpus feature is enabled, use get_dataset() instead for automatic download and caching.

Source

pub fn load(path: impl AsRef<Path>) -> Result<Self>

Load a corpus from a JSON manifest file.

Source

pub fn save(&self, path: impl AsRef<Path>) -> Result<()>

Save the corpus to a JSON manifest file.

Source

pub fn len(&self) -> usize

Get the number of images in the corpus.

Source

pub fn is_empty(&self) -> bool

Check if the corpus is empty.

Source

pub fn filter_category(&self, category: ImageCategory) -> Vec<&CorpusImage>

Filter images by category.

Source

pub fn filter_format(&self, format: &str) -> Vec<&CorpusImage>

Filter images by format.

Source

pub fn filter_min_size( &self, min_width: u32, min_height: u32, ) -> Vec<&CorpusImage>

Filter images by minimum dimensions.

Source

pub fn split(&self, train_ratio: f64) -> (Vec<&CorpusImage>, Vec<&CorpusImage>)

Split the corpus into training and validation sets.

Uses a deterministic split based on checksum to ensure reproducibility.

§Arguments
  • train_ratio - Fraction of images to include in training set (0.0-1.0).
Source

pub fn compute_checksums(&mut self) -> Result<usize>

Compute checksums for all images that don’t have them.

Source

pub fn find_duplicates(&self) -> Vec<Vec<&CorpusImage>>

Find duplicate images by checksum.

Source

pub fn update_category_counts(&mut self)

Update category counts in metadata.

Source

pub fn stats(&self) -> CorpusStats

Get statistics about the corpus.

Trait Implementations§

Source§

impl Clone for Corpus

Source§

fn clone(&self) -> Corpus

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Corpus

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for Corpus

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for Corpus

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

§

impl Freeze for Corpus

§

impl RefUnwindSafe for Corpus

§

impl Send for Corpus

§

impl Sync for Corpus

§

impl Unpin for Corpus

§

impl UnwindSafe for Corpus

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,