Struct EmbedderParams

Source

pub struct EmbedderParams {
    pub asked_dim: usize,
    pub dmap_init: bool,
    pub beta: f64,
    pub b: f64,
    pub scale_rho: f64,
    pub grad_step: f64,
    pub nb_sampling_by_edge: usize,
    pub nb_grad_batch: usize,
    pub grad_factor: usize,
    pub hierarchy_layer: usize,
    pub hubness_weighting: bool,
}

Expand description

It is necessary to describe briefly the model used in the embedding:

§Definition of the weight of an edge of the graph to embed

First we define the local scale $\rho$ around a point.
It is defined as the mean of distances of points to their nearest neighbour. The points taken into account to define $\rho$ are the node we consider and all its knbn neighbours. So we compute the mean of distances to nearest neighbours on knbn + 1 points around current point.

let ($d_{i}$) be the sorted distances in increasing order of neighbours for i=0..k of a node n, $$w_{i} = \exp\left(- \left(\frac{d_{i} - d_{0}}{S * \rho}\right)^{\beta} \right)$$

S is a scale factor modulating $\rho$. After that weights are normalized to a probability distribution.

So before normalization $w_{0}$ is always equal to 1. Augmenting β to 2. makes the weight $w_{i}$ decrease faster. The least weight of an edge must not go under $10^{-5}$ to limit the range of weight and avoid Svd numeric difficulties. The code stops with an error in this case. So after normalization the range of weights from $w_{0}$ to $w_{k}$ is larger. Reducing S as similar effect but playing with both $\beta$ and the scale adjustment must not violate the range constraint on weights.

It must be noted that setting the scale as described before and renormalizing to get a probability distribution gives a perplexity nearly equal to the number of neighbours.
This can be verified by using the logging (implemented using the crates env_logger and log) and setting RUST_LOG=annembed=INFO in your environment. Then quantile summaries are given for the distributions of edge distances, edge weights, and perplexity of nodes. This helps adjusting parameters β, Scale and show their impact on these quantiles.

Default value :

$\beta = 1$ so that we have exponential weights similar to Umap.

$S = 0.5$

But it is possible to set β to 2. to get more gaussian weight or reduce to 0.5 and adjust S to respect the constraints on edge weights.

§Definition of the weight of an edge of the embedded graph

The embedded edge has the usual expression : $$ w(x,y) = \frac{1}{1+ || \left((x - y)/a_{x} \right)||^{2*b} } $$

by default b = 1. The coefficient $a_{x}$ is deduced from the scale coefficient in the original space with some restriction to avoid too large fluctuations.

Initial step of the gradient and number of batches

A number of batch for the Mnist digits data around 10-20 seems sufficient. The initial gradient step $\gamma_{0}$ can be chosen around 1. (in the range 1/5 … 5.).
Reasonably it should satisfy nb_batch $ * \gamma_{0} < 1 $

asked_dimension : default is set to 2.

§The optimization of the embedding

The embedding is optimized by minimizing the (Shannon at present time) cross entropy between distribution of original and embedded weight of edges. This minimization is done by a standard (multithreaded) stochastic gradient with negative sampling for the unobserved edges (see Mnih-Teh or Mikolov)

The number of negative edge sampling is set to a fixed value 5.

expression of the gradient

here are the main parameters driving Embeding

Fields§

§asked_dim: usize

embedding dimension : default to 2

§dmap_init: bool

defines if embedder is initialized by a diffusion map step. default to true

§beta: f64

exponent used in defining edge weight in original graph. 0.5 or 1.

§b: f64

exponenent used in embedded space, default 1.

§scale_rho: f64

embedded scale factor. default to 1.

§grad_step: f64

initial gradient step , default to 2.

§nb_sampling_by_edge: usize

nb sampling by edge in gradient step. default = 10

§nb_grad_batch: usize

number of gradient batch. default to 15

§grad_factor: usize

the number of gradient batch in hierarchical case is nb_grad_batch multiplied by grad_factor. As the first iterations run on few points we can do more iterations. Default is 4.

§hierarchy_layer: usize

if layer > 0 means we have hierarchical initialization

§hubness_weighting: bool

To do negative sampling of nodes using hubness weights as node distribution, set it to true.
Default is false.
It improves slightly the quality estimated by quality estimator

Struct EmbedderParams Copy item path

§Definition of the weight of an edge of the graph to embed

§Definition of the weight of an edge of the embedded graph

§The optimization of the embedding

Fields§

Implementations§

impl EmbedderParams

pub fn default() -> Self

pub fn log(&self)

pub fn set_dmap_init(&mut self, val: bool)

pub fn set_nb_gradient_batch(&mut self, nb_batch: usize)

pub fn set_dim(&mut self, dim: usize)

pub fn set_nb_edge_sampling(&mut self, nb_sample_by_edge: usize)

pub fn get_dimension(&self) -> usize

pub fn set_hierarchy_layer(&mut self, layer: usize)

pub fn get_hierarchy_layer(&self) -> usize

Trait Implementations§

impl Clone for EmbedderParams

fn clone(&self) -> EmbedderParams

fn clone_from(&mut self, source: &Self)

impl Copy for EmbedderParams

Auto Trait Implementations§

impl Freeze for EmbedderParams

impl RefUnwindSafe for EmbedderParams

impl Send for EmbedderParams

impl Sync for EmbedderParams

impl Unpin for EmbedderParams

impl UnsafeUnpin for EmbedderParams

impl UnwindSafe for EmbedderParams

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> Conv for T

fn conv<T>(self) -> Twhere Self: Into<T>,

impl<T> FmtForward for T

fn fmt_binary(self) -> FmtBinary<Self>where Self: Binary,

fn fmt_display(self) -> FmtDisplay<Self>where Self: Display,

fn fmt_lower_exp(self) -> FmtLowerExp<Self>where Self: LowerExp,

fn fmt_lower_hex(self) -> FmtLowerHex<Self>where Self: LowerHex,

fn fmt_octal(self) -> FmtOctal<Self>where Self: Octal,

fn fmt_pointer(self) -> FmtPointer<Self>where Self: Pointer,

fn fmt_upper_exp(self) -> FmtUpperExp<Self>where Self: UpperExp,

fn fmt_upper_hex(self) -> FmtUpperHex<Self>where Self: UpperHex,

fn fmt_list(self) -> FmtList<Self>where &'a Self: for<'a> IntoIterator,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pipe for Twhere T: ?Sized,

fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere Self: Sized,

fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere R: 'a,

fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere R: 'a,

fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> Rwhere Self: Borrow<B>, B: 'a + ?Sized, R: 'a,

fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> Rwhere Self: BorrowMut<B>, B: 'a + ?Sized, R: 'a,

fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> Rwhere Self: AsRef<U>, U: 'a + ?Sized, R: 'a,

fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> Rwhere Self: AsMut<U>, U: 'a + ?Sized, R: 'a,

fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> Rwhere Self: Deref<Target = T>, T: 'a + ?Sized, R: 'a,

fn pipe_deref_mut<'a, T, R>( &'a mut self, func: impl FnOnce(&'a mut T) -> R, ) -> Rwhere Self: DerefMut<Target = T> + Deref, T: 'a + ?Sized, R: 'a,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<SS, SP> SupersetOf<SS> for SPwhere SS: SubsetOf<SP>,

fn to_subset(&self) -> Option<SS>

fn is_in_subset(&self) -> bool

unsafe fn to_subset_unchecked(&self) -> SS

fn from_subset(element: &SS) -> SP

impl<T> Tap for T

Struct EmbedderParams

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

fn conv<T>(self) -> T
where Self: Into<T>,

fn fmt_binary(self) -> FmtBinary<Self>
where Self: Binary,

fn fmt_display(self) -> FmtDisplay<Self>
where Self: Display,

fn fmt_lower_exp(self) -> FmtLowerExp<Self>
where Self: LowerExp,

fn fmt_lower_hex(self) -> FmtLowerHex<Self>
where Self: LowerHex,

fn fmt_octal(self) -> FmtOctal<Self>
where Self: Octal,

fn fmt_pointer(self) -> FmtPointer<Self>
where Self: Pointer,

fn fmt_upper_exp(self) -> FmtUpperExp<Self>
where Self: UpperExp,

fn fmt_upper_hex(self) -> FmtUpperHex<Self>
where Self: UpperHex,

fn fmt_list(self) -> FmtList<Self>
where &'a Self: for<'a> IntoIterator,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> Pipe for T
where T: ?Sized,

fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> R
where Self: Sized,

fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> R
where R: 'a,

fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> R
where R: 'a,

fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
where Self: Borrow<B>, B: 'a + ?Sized, R: 'a,

fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
where Self: BorrowMut<B>, B: 'a + ?Sized, R: 'a,

fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
where Self: AsRef<U>, U: 'a + ?Sized, R: 'a,

fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
where Self: AsMut<U>, U: 'a + ?Sized, R: 'a,

fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
where Self: Deref<Target = T>, T: 'a + ?Sized, R: 'a,

fn pipe_deref_mut<'a, T, R>( &'a mut self, func: impl FnOnce(&'a mut T) -> R, ) -> R
where Self: DerefMut<Target = T> + Deref, T: 'a + ?Sized, R: 'a,

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
where Self: Borrow<B>, B: ?Sized,

fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
where Self: BorrowMut<B>, B: ?Sized,

fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
where Self: AsRef<R>, R: ?Sized,

fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
where Self: AsMut<R>, R: ?Sized,

fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
where Self: Deref<Target = T>, T: ?Sized,

fn tap_deref_mut_dbg<T>(self, func: impl FnOnce(&mut T)) -> Self
where Self: DerefMut<Target = T> + Deref, T: ?Sized,

impl<T> ToOwned for T
where T: Clone,

fn try_conv<T>(self) -> Result<T, Self::Error>
where Self: TryInto<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,