Module random_projection

Expand description

§Random Projections

These algorithms build a low-distortion embedding of the input data in a low-dimensional Euclidean space by projecting the data onto a random subspace. The embedding is a randomly chosen matrix (either Gaussian or sparse), following the Johnson-Lindenstrauss Lemma.

This result states that, if the dimension of the embedding is Ω(log(n_samples)/eps^2), then with high probability, the projection p has distortion less than eps, where eps is parameter, with 0 < eps < 1. “Distortion less than eps” means that for all vectors u, v in the original dataset, we have (1 - eps) d(u, v) <= d(p(u), p(v)) <= (1 + eps) d(u, v), where d denotes the distance between two vectors.

Note that the dimension of the embedding does not depend on the original dimension of the data set (the number of features).

§Comparison with other methods

To obtain a given accuracy on a given task, random projections will often require a larger embedding dimension than other reduction methods such as PCA. However, random projections have a very low computational cost, since they only consist in sampling a random matrix, whereas the PCA requires computing the pseudoinverse of a large matrix, which is computationally expensive.

Structs§

RandomProjection: Embedding via random projection
RandomProjectionParams: Random projection hyperparameters
RandomProjectionValidParams: Random projection hyperparameters

Type Aliases§

GaussianRandomProjection
GaussianRandomProjectionParams
GaussianRandomProjectionValidParams
SparseRandomProjection
SparseRandomProjectionParams
SparseRandomProjectionValidParams