emcee 0.3.0

Implementation of Python's emcee affine-invariant mcmc ensemble sampler
Documentation
.. _pt:

.. module:: emcee


Parallel-Tempering Ensemble MCMC
================================

*Added in version 1.2.0*

When your posterior is multi-modal or otherwise hard to sample with a
standard MCMC, a good option to try is `parallel-tempered MCMC (PTMCMC)
<http://en.wikipedia.org/wiki/Parallel_tempering>`_.
PTMCMC runs multiple MCMC's at different temperatures, :math:`T`.  Each MCMC
samples from a modified posterior, given by

.. math::

   \pi_T(x) = \left[ l(x) \right]^{\frac{1}{T}} p(x)

As :math:`T \to \infty`, the posterior becomes the prior, which is
hopefully easy to sample.  If the likelihood is a Gaussian with
standard deviation :math:`\sigma`, then the tempered likelihood is
proportional to a Gaussian with standard deviation :math:`\sigma
\sqrt{T}`.

Periodically during the run, the different temperatures swap members
of their ensemble in a way that preserves detailed balance.  The hot
chains can more easily explore parameter space because the likelihood
is flatter and broader, while the cold chains do a good job of
exploring the peaks of the likelihood.  This can **dramatically**
improve convergence if your likelihood function has many
well-separated modes.

How To Sample a Multi-Modal Gaussian
------------------------------------

Suppose we want to sample from the posterior given by

.. math::

   \pi(\vec{x}) \propto \exp\left[ - \frac{1}{2}
        \left( \vec{x} - \vec{\mu}_1 \right)^T \Sigma^{-1}_1
        \left( \vec{x} - \vec{\mu}_1 \right) \right]
        + \exp\left[ -\frac{1}{2} \left( \vec{x} - \vec{\mu}_2 \right)^T
          \Sigma^{-1}_2 \left( \vec{x} - \vec{\mu}_2 \right) \right]

If the modes :math:`\mu_{1,2}` are well-separated with respect to the
scale of :math:`\Sigma_{1,2}`, then this distribution will be hard to
sample with the :class:`EnsembleSampler`.  Here is how we would sample
from it using the :class:`PTSampler`.

First, some preliminaries::

    import numpy as np
    from emcee import PTSampler

Define the means and standard deviations of our multi-modal likelihood::

    # mu1 = [1, 1], mu2 = [-1, -1]
    mu1 = np.ones(2)
    mu2 = -np.ones(2)

    # Width of 0.1 in each dimension
    sigma1inv = np.diag([100.0, 100.0])
    sigma2inv = np.diag([100.0, 100.0])

    def logl(x):
        dx1 = x - mu1
        dx2 = x - mu2

        return np.logaddexp(-np.dot(dx1, np.dot(sigma1inv, dx1))/2.0,
                            -np.dot(dx2, np.dot(sigma2inv, dx2))/2.0)

    # Use a flat prior
    def logp(x):
        return 0.0

Now we can construct a sampler object that will drive the PTMCMC;
arbitrarily, we choose to use 20 temperatures (the default is for each
temperature to increase by a factor of :math:`\sqrt{2}`, so the
highest temperature will be :math:`T = 1024`, resulting in an
effective :math:`\sigma_T = 32 \sigma = 3.2`, which is about the
separation of our modes).  Let's use 100 walkers in the ensemble at
each temperature::

    ntemps = 20
    nwalkers = 100
    ndim = 2

    sampler=PTSampler(ntemps, nwalkers, ndim, logl, logp)

Making the sampling multi-threaded is as simple as adding the
``threads=Nthreads`` argument to :class:`PTSampler`.  We could have
modified the temperature ladder using the ``betas`` optional argument
(which should be an array of :math:`\beta \equiv 1/T` values).  The
``pool`` argument also allows to specify our own pool of worker
threads if we wanted fine-grained control over the parallelism.

First, we run the sampler for 1000 burn-in iterations::

    p0 = np.random.uniform(low=-1.0, high=1.0, size=(ntemps, nwalkers, ndim))
    for p, lnprob, lnlike in sampler.sample(p0, iterations=1000):
        pass
    sampler.reset()

Now we sample for 10000 iterations, recording every 10th sample::

    for p, lnprob, lnlike in sampler.sample(p, lnprob0=lnprob,
                                               lnlike0=lnlike,
                                               iterations=10000, thin=10):
        pass

The resulting samples (1000 of them) are stored as the
``sampler.chain`` property::

    assert sampler.chain.shape == (ntemps, nwalkers, 1000, ndim)

    # Chain has shape (ntemps, nwalkers, nsteps, ndim)
    # Zero temperature mean:
    mu0 = np.mean(np.mean(sampler.chain[0,...], axis=0), axis=0)

    # Longest autocorrelation length (over any temperature)
    max_acl = np.max(sampler.acor)

    # etc


Implementation Notes
--------------------

For a description of the parallel-tempering algorithm, see, e.g. `Earl
& Deem (2010), Phys Chem Chem Phys, 7, 23, 3910
<http://adsabs.harvard.edu/abs/2005PCCP....7.3910E>`_. The algorithm
has some tunable parameters:

Temperature Ladder
    The choice of temperature for the chains will strongly influence
    the rate of convergence of the sampling.  By default, the
    :class:`PTSampler` class uses an exponential ladder, with each
    temperature increasing by a factor of :math:`\sqrt{2}`.  The user
    can supply their own ladder using the ``beta`` optional argument
    in the constructor.
Rate of Temperature Swaps
    The rate at which temperature swaps are proposed can, to a lesser
    extent, also influence the rate of convergence of the sampling.
    The goal is to make sure that good positions found by the high
    temperatures can propogate to the lower temperatures, but still
    ensure that the high-temperatures do not lose all memory of good
    locations.  Here we choose to implement one temperature swap
    proposal per walker per rung on the temperature ladder after each
    ensemble update.  This is not user-tunable, but seems to work well
    in practice.

The ``args`` optional argument is not available in the
:class:`PTSampler` constructor; use a custom class with a ``__call__``
method if you need to pass arguments to the ``lnlike`` or ``lnprior``
functions and do not want to use a global variable.

The ``thermodynamic_integration_log_evidence`` uses thermodynamic
integration (see, e.g., `Goggans & Chi (2004), AIP Conf Proc, 707, 59
<http://dx.doi.org/10.1063/1.1751356>`_) to estimate the evidence
integral.  Define the evidence as a function of inverse temperature:

.. math::

    Z(\beta) \equiv \int dx\, l^\beta(x) p(x)

We want to compute :math:`Z(1)`.  :math:`Z` satisfies the following
differential equation

.. math::

    \frac{d \ln Z}{d\beta}
        = \frac{1}{Z(\beta)} \int dx\, \ln l(x) l^\beta(x) p(x)
        = \left \langle \ln l \right\rangle_\beta

where :math:`\left\langle \ldots \right\rangle_\beta` is the average
of a quantity over the posterior at temperature :math:`T = 1/\beta`.
Integrating (note that :math:`Z(0) = 1` because the prior is
normalized), we have

.. math::

    \ln Z(1) = \int_0^1 d\beta \left \langle \ln l \right\rangle_\beta

This quantity can be estimated from a PTMCMC by computing the average
:math:`\ln l` within each chain and applying a quadrature formula to
estimate the integral.