Module lace_cc::alg

Expand description

Data types for choosing different methods of sampling in CrossCat

There are currently four algorithms that have their own special considerations.

§Gibbs

This is the collapsed Gibbs sampler outlined by Radford Neal. In this method rows/columns are randomly removed and then reinserted into the table according to the contribution to the partition and their likelihood under each component.

§Pros

Simple
Valid MCMC

§Cons

Slow due to random access and moderately expensive posterior predictive computation
Mixes slowly due to moving only one row/column at a time

§Citations

Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of computational and graphical statistics, 9(2), 249-265.

§Sequential Adaptive Merge-Split (Sams)

Two rows/columns are selected at random. If they are in the same component, a split is proposed using an adaptive restricted Gibbs sweep; if they are in different components, a merge of the two components is proposed.

§Pros

Fast since only subsets of the data are looked at
Potential for big moves since Sams operates on swaths of data

§Cons

Many proposals are rejected, especially merge proposals
Difficult to do fine-grained moves. It’s best to pair with another algorithms.

§Citations

Jain, S., & Neal, R. M. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of computational and Graphical Statistics, 13(1), 158-182.

§Finite Approximation (FiniteCpu)

In this algorithm a new component is proposed from the prior, then each row/column is reassigned according to its likelihood under each component. This algorithm does not marginalize away the component parameters like Sams and Gibbs, but explicitly represent the component parameters.

§Pros

Extremely fast due to cache efficiency and parallelization
Simple

§Cons

Not a valid MCMC method. It only approximates the distribution so it tends to create fewer components than there should be, which can cause under-fitting.
Slow to mix since rows/columns are moved individually and since the new component is drawn from the prior, it is never guaranteed to fit to anything particularly well.

§Slice sampler (Slice)

This is nearly identical to FiniteCpu, but uses a stick-breaking prior and a slice (or beam) variable to approximate the correct distribution.

NOTE: It is best to use a Gamma prior (not InvGamma) on the CRP α when using this algorithm.

§Pros

Fast
Correct MCMC

§Cons

Users must be careful when using a prior on CRP α that has high or infinite variance. If α is too high, the stick will break indefinitely and cause an panic.

Enums§

ColAssignAlg
The MCMC algorithm to use for column reassignment
RowAssignAlg
The MCMC algorithm to use for row reassignment

Module lace_cc::algCopy item path

§Gibbs

§Pros

§Cons

§Citations

§Sequential Adaptive Merge-Split (Sams)

§Pros

§Cons

§Citations

§Finite Approximation (FiniteCpu)

§Pros

§Cons

§Slice sampler (Slice)

§Pros

§Cons

Enums§

Module lace_cc::alg