Expand description

bhtsne

bhtsne contains the implementations of both a parallel, exact, version of the t-SNE algorithm and a parallel, approximate, version leveraging the Barnes-Hut algorithm.

The implementation supports custom data types and custom defined metrics. See tSNE for more details.

This crate also includes load_csv, a commodity function to parse data, record by record, from a csv file.

Example

use bhtsne;

const N: usize = 150;         // Number of vectors to embed.
const D: usize = 4;           // The dimensionality of the
                              // original space.
const THETA: f32 = 0.5;       // Parameter used by the Barnes-Hut algorithm.
                              // Small values improve accuracy but increase complexity.
    
const PERPLEXITY: f32 = 10.0; // Perplexity of the conditional distribution.
const EPOCHS: usize = 2000;   // Number of fitting iterations.
const NO_DIMS: u8 = 2;        // Dimensionality of the embedded space.

// Loads the data from a csv file skipping the first row,
// treating it as headers and skipping the 5th column,
// treating it as a class label.
// Do note that you can also switch to f64s for higher precision.
let data: Vec<f32> = bhtsne::load_csv("iris.csv", true, Some(&[4]), |float| {
    float.parse().unwrap()
})?;
let samples: Vec<&[f32]> = data.chunks(D).collect();

// Executes the Barnes-Hut approximation of the algorithm and writes the embedding to the
// specified csv file.
bhtsne::tSNE::new(&samples)
    .embedding_dim(NO_DIMS)
    .perplexity(PERPLEXITY)
    .epochs(EPOCHS)
    .barnes_hut(THETA, |sample_a, sample_b| {
        sample_a
            .iter()
            .zip(sample_b.iter())
            .map(|(a, b)| (a - b).powi(2))
            .sum::<f32>()
            .sqrt()
    })
    .write_csv("iris_embedding.csv")?;

Structs

t-distributed stochastic neighbor embedding. Provides a parallel implementation of both the exact version of the algorithm and the tree accelerated one leveraging space partitioning trees.

Functions

Loads data from a csv file.