Crate genetic_algorithm

Source
Expand description

A genetic algorithm implementation for Rust. Inspired by the book Genetic Algorithms in Elixir

There are three main elements to this approach:

  • The Genotype (the search space)
  • The Fitness function (the search goal)
  • The Strategy (the search strategy)
    • Evolve (evolution strategy)
    • Permutate (for small search spaces, with a 100% guarantee)
    • HillClimb (when search space is convex with little local optima or when crossover is impossible/inefficient)

Terminology:

  • Population: a population has population_size number of individuals (called chromosomes).
  • Chromosome: a chromosome has genes_size number of genes
  • Allele: alleles are the possible values of the genes
  • Gene: a gene is a combination of position in the chromosome and value of the gene (allele)
  • Genes: storage trait of the genes for a chromosome, mostly Vec<Allele> but alternatives possible
  • Genotype: Knows how to generate, mutate and crossover chromosomes efficiently
  • Fitness: knows how to determine the fitness of a chromosome

All multithreading mechanisms are implemented using rayon::iter and std::sync::mpsc.

§Quick Usage

use genetic_algorithm::strategy::evolve::prelude::*;

// the search space
let genotype = BinaryGenotype::builder() // boolean alleles
    .with_genes_size(100)                // 100 genes per chromosome
    .build()
    .unwrap();

println!("{}", genotype);

// the search goal to optimize towards (maximize or minimize)
#[derive(Clone, Debug)]
pub struct CountTrue;
impl Fitness for CountTrue {
    type Genotype = BinaryGenotype; // Genes = Vec<bool>
    fn calculate_for_chromosome(
        &mut self,
        chromosome: &FitnessChromosome<Self>,
        _genotype: &FitnessGenotype<Self>
    ) -> Option<FitnessValue> {
        Some(chromosome.genes.iter().filter(|&value| *value).count() as FitnessValue)
    }
}

// the search strategy
let evolve = Evolve::builder()
    .with_genotype(genotype)
    .with_select(SelectElite::new(0.9))               // sort the chromosomes by fitness to determine crossover order and select 90% of the population for crossover (drop 10% of population)
    .with_crossover(CrossoverUniform::new())          // crossover all individual genes between 2 chromosomes for offspring (and restore back to 100% of target population size by keeping the best parents alive)
    .with_mutate(MutateSingleGene::new(0.2))          // mutate offspring for a single gene with a 20% probability per chromosome
    .with_fitness(CountTrue)                          // count the number of true values in the chromosomes
    .with_fitness_ordering(FitnessOrdering::Maximize) // optional, default is Maximize, aim towards the most true values
    .with_target_population_size(100)                 // evolve with 100 chromosomes
    .with_target_fitness_score(100)                   // goal is 100 times true in the best chromosome
    .with_reporter(EvolveReporterSimple::new(100))    // optional builder step, report every 100 generations
    .call()
    .unwrap();

println!("{}", evolve);

// it's all about the best genes after all
let (best_genes, best_fitness_score) = evolve.best_genes_and_fitness_score().unwrap();
assert_eq!(best_genes, vec![true; 100]);
assert_eq!(best_fitness_score, 100);

§Tests

Use .with_rng_seed_from_u64(0) builder step to create deterministic tests results.

§Examples

§Performance considerations

For the Evolve strategy:

  • Reporting: start with EvolveReporterSimple for basic understanding of:
    • fitness v. framework overhead
    • staleness and population characteristics (cardinality etc.)
  • Select: no considerations. All selects are basically some form of in-place sorting of some kind. This is relatively fast compared to the rest of the operations.
  • Crossover: the workhorse of internal parts. Crossover touches most genes each generation and clones up to the whole population to restore lost population size in selection. See performance tips below. It also calculates new genes hashes if enabled on the Genotype, which has a relatively high overhead on the main Evolve loop.
  • Mutate: no considerations. It touches genes like crossover does, but should be used sparingly anyway; with low gene counts (<10%) and low probability (5-20%)
  • Fitness: can be anything. This fully depends on the user domain. Parallelize it using with_par_fitness() in the Builder. But beware that parallelization has it’s own overhead and is not always faster.

Performance Tips

  • Small genes sizes
    • It seems that CrossoverMultiGene with number_of_crossovers = genes_size / 2 and allow_duplicates = true is the best tradeoff between performance and effect. CrossoverUniform is an alias for the same approach, taking the genes_size from the genotype at runtime.
    • Restoring the population doesn’t matter that much as the cloning is relatively less pronounced (but becomes more prominent for larger population sizes)
  • Large genes sizes
    • It seems that CrossoverMultiPoint with number_of_crossovers = genes_size / 9 and allow_duplicates = false is the best tradeoff between performance and effect.
    • Restoring the population has considerable performance effects. Use a high selection_rate or even 100%, so there is little parent cloning. Explore non-Vec based genotypes like BitGenotype.

GPU acceleration

There are two genotypes where Genes (N) and Population (M) are a stored in single contiguous memory range of Alleles (T) with length N*M on the heap. A pointer to this data can be taken to calculate the whole population at once. These are:

Useful in the following strategies where a whole population is calculated:

Possibly a GPU compatible memory layout still needs to be added. The current implementation just provides all the basic building blocks to implement this. Please open a github issue for further support.

Modules§

allele
The possible values for a single gene
chromosome
The chromosome is a container for the genes and stores some useful values
crossover
The crossover phase where every two parent chromosomes create two children chromosomes. The selection phase determines the order and the amount of the parent pairing (overall with fitter first).
errors
extension
When approacking a (local) optimum in the fitness score, the variation in the population goes down dramatically. The offspring will become clones of the parents and the only factor seeding randomness is the mutation of the offspring. But this remaining randomness might not be selected for, killing of the offspring again. This reduces the efficiency, but also has the risk of local optimum lock-in. To increase the variation in the population, an extension mechanisms can optionally be used
fitness
The search goal to optimize towards (maximize or minimize).
genotype
The search space for the algorithm.
mutate
The mutation strategy, very important for avoiding local optimum lock-in. But don’t overdo it, as it degenerates the population too much if overused. Use a mutation probability generally between 5% and 20%.
population
The population is a container for Chromosomes
select
The selection phase, where chromosomes are lined up for pairing in the crossover phase, dropping the chromosomes outside of the selection_rate. Ensure the selection_rate >= 0.5 otherwise the population will decline and can’t restore.
strategy
solution strategies for finding the best chromosomes.