genetic-algorithm
A genetic algorithm implementation for Rust. Inspired by the book Genetic Algorithms in Elixir
There are three main elements to this approach:
- The Genotype (the search space)
- The Fitness function (the search goal)
- The strategy (the search strategy)
- Evolve (evolution strategy)
- Permutate (for small search spaces, with a 100% guarantee)
- HillClimb (when search space is convex with little local optima or when crossover is impossible/inefficient)
Terminology:
- Population: a population has
population_size
number of individuals (called chromosomes). - Chromosome: a chromosome has
genes_size
number of genes - Allele: alleles are the possible values of the genes
- Gene: a gene is a combination of position in the chromosome and value of the gene (allele)
- Genes: storage trait of the genes for a chromosome, mostly
Vec<Allele>
, but alternatives possible - Genotype: Knows how to generate, mutate and crossover chromosomes efficiently
- Fitness: knows how to determine the fitness of a chromosome
All multithreading mechanisms are implemented using rayon::iter and std::sync::mpsc.
Documentation
See docs.rs
Quick Usage
use *;
// the search space
let genotype = builder // boolean alleles
.with_genes_size // 100 genes per chromosome
.build
.unwrap;
println!;
// the search goal to optimize towards (maximize or minimize)
;
}
// the search strategy
let evolve = builder
.with_genotype
.with_compete // sort the chromosomes by fitness to determine crossover order
.with_crossover // crossover all individual genes between 2 chromosomes for offspring, keep 50% of parents around for next generation
.with_mutate // mutate a single gene with a 20% probability per chromosome
.with_fitness // count the number of true values in the chromosomes
.with_target_population_size // evolve with 100 chromosomes
.with_target_fitness_score // goal is 100 times true in the best chromosome
.with_reporter // optional builder step, report every 100 generations
.call;
.unwrap
println!;
Examples
Run with cargo run --example [EXAMPLE_BASENAME] --release
- N-Queens puzzle https://en.wikipedia.org/wiki/Eight_queens_puzzle.
- See examples/evolve_nqueens
- See examples/hill_climb_nqueens
UniqueGenotype<u8>
with a 64x64 chess board setup- custom
NQueensFitness
fitness
- Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem
- See examples/evolve_knapsack
- See examples/permutate_knapsack
BinaryGenotype<Item(weight, value)>
each gene encodes presence in the knapsack- custom
KnapsackFitness(&items, weight_limit)
fitness
- Infinite Monkey theorem: https://en.wikipedia.org/wiki/Infinite_monkey_theorem
- See examples/evolve_monkeys
ListGenotype<char>
100 monkeys randomly typing characters in a loop- custom fitness using hamming distance
- Permutation strategy instead of Evolve strategy for small search spaces, with a 100% guarantee
- HillClimb strategy instead of Evolve strategy, when crossover is impossible or inefficient
- Explore vector genes (BinaryGenotype) versus other storage (BitGenotype)
- Explore internal and external multithreading options
- Custom Fitness function with LRU cache
- See examples/evolve_binary_lru_cache_fitness
- Note: doesn't help performance much in this case...
- Custom Reporting implementation
Performance considerations
For the Evolve strategy:
- Compete: no considerations. All competes are basically some form of in-place sorting of some kind. This is relatively fast compared to the rest of the operations.
- Crossover: the workhorse of internal parts. Crossover touches most genes each generation and clones up to the whole population if you keep all the parents around. See performance tips below.
- Mutate: no considerations. It touches genes like crossover does, but should be used sparingly anyway; with low gene counts (<10%) and low probability (5-20%)
- Fitness: can be anything. This fully depends on the user domain. Parallelize
it using
with_par_fitness()
in the Builder. But beware that parallelization has it's own overhead and is not always faster.
Performance Tips
- Small genes sizes
- It seems that CrossoverMultiGene with
number_of_crossovers = genes_size / 2
andallow_duplicates = true
is the best tradeoff between performance and effect. CrossoverUniform is an alias for the same approach, taking the genes_size from the genotype at runtime. - Keeping the parents around doesn't matter that much as the cloning is relatively less pronounced (but becomes more prominent for larger population sizes)
- It seems that CrossoverMultiGene with
- Large genes sizes
- It seems that CrossoverMultiPoint with
number_of_crossovers = genes_size / 9
andallow_duplicates = false
is the best tradeoff between performance and effect. - Keeping the parents around has major performance effects and should be avoided. Use low parent_survival_rate or none at all. Explore non-Vec based genotypes like BitGenotype.
- It seems that CrossoverMultiPoint with
Tests
Run tests with cargo test
Use .with_rng_seed_from_u64(0)
builder step to create deterministic tests results.
Benchmarks
Implemented using criterion. Run benchmarks with cargo bench
Profiling
Implemented using criterion and pprof.
Uncomment in Cargo.toml
[profile.release]
debug = 1
Run with cargo run --example profile_evolve_binary --release -- --bench --profile-time 5
Find the flamegraph in: ./target/criterion/profile_evolve_binary/profile/flamegraph.svg
TODO
MAYBE
- Target cardinality range for Mutate Dynamic to avoid constant switching
- Default max_stale_generations to 1 for SteepestAscent
- Add scaling permutate? Can be done by grid search and then search within last grid with new scale
- Add scaling helper function
- Add simulated annealing strategy
- Add Roulette competition with and without duplicates (with fitness ordering)
- Add OrderOne crossover for UniqueGenotype?
- Add WholeArithmetic crossover for RangeGenotype?
- Add CountTrueWithWork instead of CountTrueWithSleep for better benchmarks?
- Explore more non-Vec genes: PackedSimd, ArrayVec
ISSUES
- permutate (and possibly others) with gene_size 0 panics. Maybe it should just return a empty chromosome?