Crate programinduction[−][src]

A library for program induction and learning representations.

Bayesian program learning with the EC algorithm

Bayesian program learning (BPL) is a generative model that samples programs, where programs are generative models that sample observations (or examples). BPL is an appealing model for its ability to learn rich concepts from few examples. It was popularized by Lake et al. in the 2015 Science paper Human-level concept learning through probabilistic program induction.

The Exploration-Compression (EC) algorithm is an inference scheme for BPL that learns new representations by exploring solutions to tasks and then compressing those solutions by recognizing common pieces and re-introducing them as primitives. This is an iterative algorithm that can be applied repeatedly and has been found to converge in practice. It was introduced by Dechter et al. in the 2013 IJCAI paper Bootstrap learning via modular concept discovery. The EC algorithm can be viewed as an expectation-maximization (EM) algorithm for approximate maximum a posteriori (MAP) estimation of programs and their representation. Roughly, the exploration step of EC corresponds to the expectation step of EM and the compression step of EC corresponds to the maximization step of EM.

A domain-specific language (DSL) \(\mathcal{D}\) expresses the programming primitives and encodes domain-specific knowledge about the space of programs. In conjunction with a weight vector \(\theta\), the pair \((\mathcal{D}, \theta)\) determines a probability distribution over programs. From a program \(p\), we can sample a task \(x\). Framed as a hierarchical Bayesian model, we have that \(p \sim (\mathcal{D}, \theta)\) and \(x \sim p\). Given a set of tasks \(X\) together with a likelihood model \(\mathbb{P}[x|p]\) for scoring a task \(x\in X\) given a program \(p\), our goal is to jointly infer a DSL \(\mathcal{D}\) and latent programs for each task that are likely (i.e. "solutions"). For joint probability \(J\) of \((\mathcal{D},\theta)\) and \(X\), we want the \(\mathcal{D}^*\) and \(\theta^*\) solving: \[ \begin{aligned} J(\mathcal{D},\theta) &\triangleq \mathbb{P}[\mathcal{D},\theta] \prod_{x\in X} \sum_p \mathbb{P}[x|p]\mathbb{P}[p|\mathcal{D},\theta] \\ \mathcal{D}^* &= \underset{\mathcal{D}}{\text{arg}\,\text{max}} \int J(\mathcal{D},\theta)\;\mathrm{d}\theta \\ \theta^* &= \underset{\theta}{\text{arg}\,\text{max}} J(\mathcal{D}^*,\theta) \end{aligned} \] This is intractible because calculating \(J\) involves taking a sum over every possible program. We therefore define the frontier of a task \(x\), written \(\mathcal{F}_x\), to be a finite set of programs where \(\mathbb{P}[x|p] > 0\) for all \(p \in \mathcal{F}_x\) and establish an intuitive lower bound: \[ J\geq \mathscr{L}\triangleq \mathbb{P}[\mathcal{D},\theta] \prod_{x\in X} \sum_{p\in \mathcal{F}_x} \mathbb{P}[x|p]\mathbb{P}[p|\mathcal{D},\theta] \] We find programs and induce a DSL by alternating maximization with respect to the frontiers \(\mathcal{F}_x\) and the DSL \(\mathcal{D}\). Frontiers are assigned by enumerating programs up to some depth according to \((\mathcal{D}, \theta)\). The DSL \(\mathcal{D}\) is modified by "compression", where common program fragments become invented primitives. Estimation of \(\theta\) is a detail that a representation must define (for example, this is done for probabilistic context-free grammars, or PCFGs, using the inside-outside algorithm).

In this library, we represent a Task as an observation \(x\) together with a log-likelihood model (or oracle) \(\log\mathbb{P}[x|p]\). We additionally associate tasks with a type, tp, to provide a straightforward constraint for finding programs which may be solutions. Programs may be expressed under different representations, so we provide a representation-agnostic trait EC. We additionally provide two such representations: a polymorphically-typed lambda calculus in the lambda module, and a probabilistic context free grammar in the pcfg module.

See the EC trait for details and an example.

Genetic programming

Our primary resource is the 2008 book by Poli et al.: A Field Guide to Genetic Programming.

Genetic programming (GP) applies an evolutionary algorithm to a population of programs. The evolutionary algorithm involves mutations to selected individuals and crossover between pairs of selected individuals. Individuals are selected with a tournament, where a random subset of the population is drawn and the best individual is selected (note: there are alternative selection strategies). Crucial to GP is the notion of fitness, a way of measuring how well individuals perform.

In this library, we represent a Task as a fitness function in the oracle field, and a constraint for relevant programs as a type in the tp field. The observation field is not utilized by GP (we recommend setting it to unit). Programs may be expressed under different representations, so we provide a representation-agnostic trait GP. We provide an implementation for probabilistic context free grammars in the pcfg module.

See the GP trait for details and an example.

Modules

domains	We've implemented some domains to make things easier for you.
lambda	(representation) Polymorphically-typed lambda calculus.
pcfg	(representation) Probabilistic context-free grammar without bound variables or polymorphism.

Structs

ECFrontier	A set of expressions which solve a task.
ECParams	Parameters for the EC algorithm.
GPParams	Parameters for genetic programming.
Task	A task which is solved by an expression under some representation.

Traits

EC	A kind of representation suitable for exploration-compression.
GP	A kind of representation suitable for genetic programming.