Crate mnist [] [src]

A crate for parsing the MNIST data set into vectors to be used by Rust programs.

About the MNIST Database

The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. wikipedia

The MNIST data set contains 70,000 images of handwritten digits and their corresponding labels. The images are 28x28 with pixel values from 0 to 255. The labels are the digits from 0 to 9. By default 60,000 of these images belong to a training set and 10,000 of these images belong to a test set.

Setup

The MNIST data set is a collection of four gzip files and can be found here. There is one file for each of the following: the training set images, the training set labels, the test set images, and the test set labels. Because of space limitations, the files themselves could not be included in this crate. The four files must be downloaded and extracted. By default, they will be looked for in a "data" directory at the top of level of your crate.

Usage

A Mnist struct is used to represent the various sets of data. In machine learning, it is common to have three sets of data:

  • Training Set - Used to train a classifier.
  • Validation Set - Used to regulate the training process (this set is not included in the default MNIST data set partitioning).
  • Test Set - Used after the training process to determine if the classifier has actually learned something.

Each set of data contains a vector representing the image and a vector representing the label. The vectors are always completely flattened. For example, the default image test set contains 60,000 images. Therefore the vector size will be 60,000 images x 28 rows x 28 cols = 47,040,000 elements in the vector.

A MnistBuilder struct is used to configure how to format the MNIST data, retrieves the data, and returns the Mnist struct. Configuration options include:

  • where to look for the MNIST data files.
  • how to format the label matricies.
  • how to partition the data between the training, validation, and test sets.

Examples

extern crate mnist;
extern crate rulinalg;

use mnist::{Mnist, MnistBuilder};
use rulinalg::matrix::{BaseMatrix, BaseMatrixMut, Matrix};

fn main() {
    let (trn_size, rows, cols) = (50_000, 28, 28);

    // Deconstruct the returned Mnist struct.
    let Mnist { trn_img, trn_lbl, .. } = MnistBuilder::new()
        .label_format_digit()
        .training_set_length(trn_size)
        .validation_set_length(10_000)
        .test_set_length(10_000)
        .finalize();

    // Get the label of the first digit.
    let first_label = trn_lbl[0];
    println!("The first digit is a {}.", first_label);

    // Convert the flattened training images vector to a matrix.
    let trn_img = Matrix::new((trn_size * rows) as usize, cols as usize, trn_img);

    // Get the image of the first digit.
    let row_indexes = (0..27).collect::<Vec<_>>();
    let first_image = trn_img.select_rows(&row_indexes);
    println!("The image looks like... \n{}", first_image);

    // Convert the training images to f32 values scaled between 0 and 1.
    let trn_img: Matrix<f32> = trn_img.try_into().unwrap() / 255.0;

    // Get the image of the first digit and round the values to the nearest tenth.
    let first_image = trn_img.select_rows(&row_indexes)
        .apply(&|p| (p * 10.0).round() / 10.0);
    println!("The image looks like... \n{}", first_image);
}

Structs

Mnist

Struct containing image and label vectors for the training, validation, and test sets.

MnistBuilder

Struct used for configuring how to load the MNIST data.