Expand description
A genetic algorithm implementation for Rust. Inspired by the book Genetic Algorithms in Elixir
There are three main elements to this approach:
- The Genotype (the search space)
- The Fitness function (the search goal)
- The Strategy (the search strategy)
Terminology:
- Population: a population has
population_size
number of individuals (called chromosomes). - Chromosome: a chromosome has
genes_size
number of genes - Allele: alleles are the possible values of the genes
- Gene: a gene is a combination of position in the chromosome and value of the gene (allele)
- Genes: storage trait of the genes for a chromosome, always
Vec<Allele>
- Genotype: Knows how to generate, mutate and crossover chromosomes efficiently
- Fitness: knows how to determine the fitness of a chromosome
All multithreading mechanisms are implemented using rayon::iter and std::sync::mpsc.
§Quick Usage
use genetic_algorithm::strategy::evolve::prelude::*;
// the search space
let genotype = BinaryGenotype::builder() // boolean alleles
.with_genes_size(100) // 100 genes per chromosome
.build()
.unwrap();
println!("{}", genotype);
// the search goal to optimize towards (maximize or minimize)
#[derive(Clone, Debug)]
pub struct CountTrue;
impl Fitness for CountTrue {
type Genotype = BinaryGenotype; // Genes = Vec<bool>
fn calculate_for_chromosome(
&mut self,
chromosome: &FitnessChromosome<Self>,
_genotype: &FitnessGenotype<Self>
) -> Option<FitnessValue> {
Some(chromosome.genes.iter().filter(|&value| *value).count() as FitnessValue)
}
}
// the search strategy
let evolve = Evolve::builder()
.with_genotype(genotype)
.with_select(SelectElite::new(0.5, 0.02)) // sort the chromosomes by fitness to determine crossover order. Strive to replace 50% of the population with offspring. Allow 2% through the non-generational best chromosomes gate before selection and replacement
.with_crossover(CrossoverUniform::new(0.7, 0.8)) // crossover all individual genes between 2 chromosomes for offspring with 70% parent selection (30% do not produce offspring) and 80% chance of crossover (20% of parents just clone)
.with_mutate(MutateSingleGene::new(0.2)) // mutate offspring for a single gene with a 20% probability per chromosome
.with_fitness(CountTrue) // count the number of true values in the chromosomes
.with_fitness_ordering(FitnessOrdering::Maximize) // optional, default is Maximize, aim towards the most true values
.with_target_population_size(100) // evolve with 100 chromosomes
.with_target_fitness_score(100) // goal is 100 times true in the best chromosome
.with_reporter(EvolveReporterSimple::new(100)) // optional builder step, report every 100 generations
.call()
.unwrap();
println!("{}", evolve);
// it's all about the best genes after all
let (best_genes, best_fitness_score) = evolve.best_genes_and_fitness_score().unwrap();
assert_eq!(best_genes, vec![true; 100]);
assert_eq!(best_fitness_score, 100);
§Tests
Use .with_rng_seed_from_u64(0)
builder step to create deterministic tests results.
§Examples
- N-Queens puzzle https://en.wikipedia.org/wiki/Eight_queens_puzzle
- See examples/evolve_nqueens
- See examples/hill_climb_nqueens
UniqueGenotype<u8>
with a 64x64 chess board setup- custom
NQueensFitness
fitness
- Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem
- See examples/evolve_knapsack
- See examples/permutate_knapsack
BinaryGenotype<Item(weight, value)>
each gene encodes presence in the knapsack- custom
KnapsackFitness(&items, weight_limit)
fitness
- Infinite Monkey theorem: https://en.wikipedia.org/wiki/Infinite_monkey_theorem
- See examples/evolve_monkeys
ListGenotype<char>
100 monkeys randomly typing characters in a loop- custom fitness using hamming distance
- Permutation strategy instead of Evolve strategy for small search spaces, with a 100% guarantee
- HillClimb strategy instead of Evolve strategy, when crossover is impossible or inefficient
- Explore internal and external multithreading options
- Use superset StrategyBuilder for easier switching in implementation
- Use fitness LRU cache
- See examples/evolve_binary_cache_fitness
- Note: doesn’t help performance much in this case… or any case, better fix your population diversity
- Custom Reporting implementation
- Custom Mutate implementation
§Performance considerations
For the Evolve strategy:
- Reporting: start with EvolveReporterSimple for basic understanding of:
- fitness v. framework overhead
- staleness and population characteristics (cardinality etc.)
- Select: no considerations. All selects are basically some form of in-place sorting of some kind based on chromosome metadata. This is relatively fast compared to the rest of the operations.
- Crossover: the workhorse of internal parts. Crossover touches most genes each generation, calculates genes hashes and clones up to the whole population to produce offspring (depending on selection-rate).
- Mutate: no considerations. It touches genes like crossover does, but should be used sparingly anyway; with low gene counts (<10%) and low probability (5-20%)
- Fitness: can be anything, but usually very dominant (>80% total time). This fully
depends on the user domain. Parallelize it using
with_par_fitness()
in the Builder. But beware that parallelization has it’s own overhead and is not always faster.
So framework overhead is mostly Crossover. Practical overhead is mostly Fitness.
Regarding the optionality of genes hashing and chromosomes recycling: For large chromosomes, disabling chromosome recycling and enabling genes hashing leads to a 3x factor in framework overhead. For small chromosomes, neither feature has overhead effects. But do keep in mind that for large chromosomes the Fitness calculation will be even more dominant with regards to the framework overhead as it already is. See examples/evolve_large_genotype
Default configuration for correctness AND performance
- .with_genes_hashing(true) // Required for proper GA dynamics
- .with_chromosome_recycling(true) // Still worth it for large chromosomes, maybe disable for easier custom implementations
Modules§
- allele
- The possible values for a single gene
- chromosome
- The chromosome is a container for the genes and stores some useful values
- crossover
- The crossover phase where two parent chromosomes create two children chromosomes. The selection phase determines the order the parent pairing (overall with fitter first).
- errors
- extension
- When approacking a (local) optimum in the fitness score, the variation in the population goes down dramatically. The offspring will become clones of the parents and the only factor seeding randomness is the mutation of the offspring. But this remaining randomness might not be selected for, killing of the offspring again. This reduces the efficiency, but also has the risk of local optimum lock-in. To increase the variation in the population, an extension mechanisms can optionally be used
- fitness
- The search goal to optimize towards (maximize or minimize).
- genotype
- The search space for the algorithm.
- mutate
- The mutation strategy, very important for avoiding local optimum lock-in. But don’t overdo it, as it degenerates the population too much if overused. Use a mutation probability generally between 5% and 20%.
- population
- The population is a container for Chromosomes and handles optional chromsome recycling
- select
- The selection phase, where chromosomes are lined up for pairing in the crossover phase, dropping the chromosomes outside of the target_population_size.
- strategy
- solution strategies for finding the best chromosomes.
Macros§
- impl_
allele - Macro for implementing Allele with default hash_slice Use this for any type that implements Hash and needs the standard hashing behavior