# Crate programinduction[−][src]

A library for program induction and learning representations.

# Bayesian program learning with the EC algorithm

Bayesian program learning (BPL) is a generative model that samples programs, where programs are
generative models that sample observations (or examples). BPL is an appealing model for its
ability to learn rich concepts from few examples. It was popularized by Lake et al. in the
2015 *Science* paper Human-level concept learning through probabilistic program induction.

The Exploration-Compression (EC) algorithm is an inference scheme for BPL that learns new
representations by *exploring* solutions to tasks and then *compressing* those solutions by
recognizing common pieces and re-introducing them as primitives. This is an iterative algorithm
that can be applied repeatedly and has been found to converge in practice. It was introduced by
Dechter et al. in the 2013 *IJCAI* paper Bootstrap learning via modular concept discovery.
The EC algorithm can be viewed as an expectation-maximization (EM) algorithm for approximate
maximum a posteriori (MAP) estimation of programs and their representation. Roughly, the
exploration step of EC corresponds to the expectation step of EM and the compression step of EC
corresponds to the maximization step of EM.

A domain-specific language (DSL) \(\mathcal{D}\) expresses the programming primitives and
encodes domain-specific knowledge about the space of programs. In conjunction with a weight
vector \(\theta\), the pair \((\mathcal{D}, \theta)\) determines a probability
distribution over programs. From a program \(p\), we can sample a task \(x\). Framed as a
hierarchical Bayesian model, we have that \(p \sim (\mathcal{D}, \theta)\) and \(x \sim
p\). Given a set of tasks \(X\) together with a likelihood model \(\mathbb{P}[x|p]\) for
scoring a task \(x\in X\) given a program \(p\), our goal is to jointly infer a DSL
\(\mathcal{D}\) and latent programs for each task that are likely (i.e. "solutions"). For
joint probability \(J\) of \((\mathcal{D},\theta)\) and \(X\), we want the
\(\mathcal{D}^*\) and \(\theta^*\) solving:
\[
\begin{aligned}
J(\mathcal{D},\theta) &\triangleq
\mathbb{P}[\mathcal{D},\theta]
\prod_{x\in X} \sum_p \mathbb{P}[x|p]\mathbb{P}[p|\mathcal{D},\theta] \\
\mathcal{D}^* &= \underset{\mathcal{D}}{\text{arg}\,\text{max}}
\int J(\mathcal{D},\theta)\;\mathrm{d}\theta \\
\theta^* &= \underset{\theta}{\text{arg}\,\text{max}}
J(\mathcal{D}^*,\theta)
\end{aligned}
\]
This is intractible because calculating \(J\) involves taking a sum over every possible
program. We therefore define the *frontier* of a task \(x\), written \(\mathcal{F}_x\),
to be a finite set of programs where \(\mathbb{P}[x|p] > 0\) for all \(p \in
\mathcal{F}_x\) and establish an intuitive lower bound:
\[
J\geq \mathscr{L}\triangleq \mathbb{P}[\mathcal{D},\theta]
\prod_{x\in X} \sum_{p\in \mathcal{F}_x}
\mathbb{P}[x|p]\mathbb{P}[p|\mathcal{D},\theta]
\]
We find programs and induce a DSL by alternating maximization with respect to the frontiers
\(\mathcal{F}_x\) and the DSL \(\mathcal{D}\). Frontiers are assigned by enumerating
programs up to some depth according to \((\mathcal{D}, \theta)\). The DSL \(\mathcal{D}\)
is modified by "compression", where common program fragments become invented primitives.
Estimation of \(\theta\) is a detail that a representation must define (for example, this is
done for probabilistic context-free grammars, or PCFGs, using the inside-outside algorithm).

In this library, we represent a `Task`

as an `observation`

\(x\) together with a
log-likelihood model (or `oracle`

) \(\log\mathbb{P}[x|p]\). We additionally associate
tasks with a type, `tp`

, to provide a straightforward constraint for finding programs which
may be solutions. Programs may be expressed under different representations, so we provide a
representation-agnostic trait `EC`

. We additionally provide two such representations: a
polymorphically-typed lambda calculus in the `lambda`

module, and a probabilistic context
free grammar in the `pcfg`

module.

See the `EC`

trait for details and an example.

# Genetic programming

Our primary resource is the 2008 book by Poli et al.: *A Field Guide to Genetic Programming*.

Genetic programming (GP) applies an evolutionary algorithm to a population of programs. The
evolutionary algorithm involves *mutations* to selected individuals and *crossover* between
pairs of selected individuals. Individuals are selected with a tournament, where a random
subset of the population is drawn and the best individual is selected (note: there are
alternative selection strategies). Crucial to GP is the notion of *fitness*, a way of measuring
how well individuals perform.

In this library, we represent a `Task`

as a fitness function in the `oracle`

field, and a
constraint for relevant programs as a type in the `tp`

field. The `observation`

field is
not utilized by GP (we recommend setting it to `unit`

). Programs may be expressed under
different representations, so we provide a representation-agnostic trait `GP`

. We provide an
implementation for probabilistic context free grammars in the `pcfg`

module.

See the `GP`

trait for details and an example.

## Modules

domains |
We've implemented some domains to make things easier for you. |

lambda |
(representation) Polymorphically-typed lambda calculus. |

pcfg |
(representation) Probabilistic context-free grammar without bound variables or polymorphism. |

## Structs

ECFrontier |
A set of expressions which solve a task. |

ECParams |
Parameters for the EC algorithm. |

GPParams |
Parameters for genetic programming. |

Task |
A task which is solved by an expression under some representation. |

## Traits

EC |
A kind of representation suitable for |

GP |
A kind of representation suitable for |