Crate genetic_algorithm
source ·Expand description
A genetic algorithm implementation for Rust. Inspired by the book Genetic Algorithms in Elixir
There are three main elements to this approach:
- The Genotype (the search space)
- The Fitness function (the search goal)
- The Strategy (the search strategy)
Terminology:
- Population: a population has
population_size
number of individuals (called chromosomes). - Chromosome: a chromosome has
genes_size
number of genes - Allele: alleles are the possible values of the genes
- Gene: a gene is a combination of position in the chromosome and value of the gene (allele)
- Genes: storage trait of the genes for a chromosome, mostly
Vec<Allele>
but alternatives possible - Genotype: Knows how to generate, mutate and crossover chromosomes efficiently
- Fitness: knows how to determine the fitness of a chromosome
All multithreading mechanisms are implemented using rayon::iter and std::sync::mpsc.
§Quick Usage
use genetic_algorithm::strategy::evolve::prelude::*;
// the search space
let genotype = BinaryGenotype::builder() // boolean alleles
.with_genes_size(100) // 100 genes per chromosome
.build()
.unwrap();
println!("{}", genotype);
// the search goal to optimize towards (maximize or minimize)
#[derive(Clone, Debug)]
pub struct CountTrue;
impl Fitness for CountTrue {
type Genotype = BinaryGenotype; // Genes = Vec<bool>
fn calculate_for_chromosome(&mut self, chromosome: &Chromosome<Self::Genotype>) -> Option<FitnessValue> {
Some(chromosome.genes.iter().filter(|&value| *value).count() as FitnessValue)
}
}
// the search strategy
let evolve = Evolve::builder()
.with_genotype(genotype)
.with_select(SelectElite::new(0.9)) // sort the chromosomes by fitness to determine crossover order and select 90% of the population for crossover (drop 10% of population)
.with_crossover(CrossoverUniform::new()) // crossover all individual genes between 2 chromosomes for offspring (and restore back to 100% of target population size by keeping the best parents alive)
.with_mutate(MutateSingleGene::new(0.2)) // mutate offspring for a single gene with a 20% probability per chromosome
.with_fitness(CountTrue) // count the number of true values in the chromosomes
.with_target_population_size(100) // evolve with 100 chromosomes
.with_target_fitness_score(100) // goal is 100 times true in the best chromosome
.with_reporter(EvolveReporterSimple::new(100)) // optional builder step, report every 100 generations
.call()
.unwrap();
println!("{}", evolve);
§Tests
Use .with_rng_seed_from_u64(0)
builder step to create deterministic tests results.
§Examples
- N-Queens puzzle https://en.wikipedia.org/wiki/Eight_queens_puzzle
- See examples/evolve_nqueens
- See examples/hill_climb_nqueens
UniqueGenotype<u8>
with a 64x64 chess board setup- custom
NQueensFitness
fitness
- Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem
- See examples/evolve_knapsack
- See examples/permutate_knapsack
BinaryGenotype<Item(weight, value)>
each gene encodes presence in the knapsack- custom
KnapsackFitness(&items, weight_limit)
fitness
- Infinite Monkey theorem: https://en.wikipedia.org/wiki/Infinite_monkey_theorem
- See examples/evolve_monkeys
ListGenotype<char>
100 monkeys randomly typing characters in a loop- custom fitness using hamming distance
- Permutation strategy instead of Evolve strategy for small search spaces, with a 100% guarantee
- HillClimb strategy instead of Evolve strategy, when crossover is impossible or inefficient
- Explore vector genes BinaryGenotype versus other storage BitGenotype
- Explore internal and external multithreading options
- Custom Fitness function with LRU cache
- See examples/evolve_binary_lru_cache_fitness
- Note: doesn’t help performance much in this case…
- Custom Reporting implementation
§Performance considerations
For the Evolve strategy:
- Select: no considerations. All selects are basically some form of in-place sorting of some kind. This is relatively fast compared to the rest of the operations.
- Crossover: the workhorse of internal parts. Crossover touches most genes each generation and clones up to the whole population to restore lost population size in selection. See performance tips below.
- Mutate: no considerations. It touches genes like crossover does, but should be used sparingly anyway; with low gene counts (<10%) and low probability (5-20%)
- Fitness: can be anything. This fully depends on the user domain. Parallelize
it using
with_par_fitness()
in the Builder. But beware that parallelization has it’s own overhead and is not always faster.
Performance Tips
- Small genes sizes
- It seems that CrossoverMultiGene with
number_of_crossovers = genes_size / 2
andallow_duplicates = true
is the best tradeoff between performance and effect. CrossoverUniform is an alias for the same approach, taking the genes_size from the genotype at runtime. - Restoring the population doesn’t matter that much as the cloning is relatively less pronounced (but becomes more prominent for larger population sizes)
- It seems that CrossoverMultiGene with
- Large genes sizes
- It seems that CrossoverMultiPoint with
number_of_crossovers = genes_size / 9
andallow_duplicates = false
is the best tradeoff between performance and effect. - Restoring the population has major performance effects and should be avoided. Use a high selection_rate or even 100%, so there is little parent cloning. Explore non-Vec based genotypes like BitGenotype.
- It seems that CrossoverMultiPoint with
Modules§
- The chromosome is a container for the genes and caches a fitness score
- The crossover phase where every two parent chromosomes create two children chromosomes. The selection phase determines the order and the amount of the parent pairing (overall with fitter first).
- When approacking a (local) optimum in the fitness score, the variation in the population goes down dramatically. This reduces the efficiency, but also has the risk of local optimum lock-in. To increase the variation in the population, an extension mechanisms can optionally be used
- The search goal to optimize towards (maximize or minimize).
- The search space for the algorithm.
- The mutation strategy, very important for avoiding local optimum lock-in. But don’t overdo it, as it degenerates the population too much if overused. Use a mutation probability generally between 5% and 20%.
- The population is a container for Chromosomes
- The selection phase, where chromosomes are lined up for pairing in the crossover phase, dropping the chromosomes outside of the selection_rate. Ensure the selection_rate >= 0.5 otherwise the population will decline and can’t restore.
- solution strategies for finding the best chromosomes.