1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228
/*! # BoolNetEvo Evolve populations of boolean networks to approximate bitstring functions and their (unknown) inverses. <details> <summary> ► Machinery <i>(click to show/hide)</i> </summary> ```text Layer k+1 ● ● ↑ ┌───────┼───────┐ ┌───────┼───────┐ ↑ ○ ○ ○ ○ ○ ○ ↑ / | \ / | \ / | \ / | \ / | \ / | \ Layer k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ``` [Boolean network](https://en.wikipedia.org/wiki/Boolean_network), or *boolnet*, is a well-known particular case of an [artificial neural network](https://en.wikipedia.org/wiki/Artificial_neural_network) where inputs/outputs of each cell, or *boolon*, are bits — 0 (false) and 1 (true) — rather than values from a range, say, (-∞; +∞). Output of a boolon is a boolean function of its inputs, which are outputs of boolons from previous layer. In this implementation, boolean functions have 3 variables and are encoded by `u8` values, which represent the truth tables of such functions. Each boolon has input tree, whose leaves are its input boolons, and then, at each level of that tree, a boolean function of 3 values at current level gives 1 value at next level. Single value at root level is the output of this boolon. On the figure above, input tree has height 2, so each boolon's output is affected by 3<sup>2</sup> = 9 outputs of boolons from previous layer (some may coincide). Number of boolons is the same in all layers, and the signal can propagate through a boolnet for few times, called *cycles*: at the end of each cycle, outputs of the last layer become inputs for the first layer. Under these constraints, layers can be viewed as steps residing in time rather than space, if you identify i-th boolon of all layers, which changes its input tree at every step within cycle. Layers may have *rands* boolons whose outputs are random. Enter robustness and free (unpredictable) will... at the cost of slower learning. Array of 256 possible 3-variable boolean functions is used to perform simultaneous calculation for 128 data flows, in a *batch*. Several boolnets of the same architecture make a *population*, which evolves in time by means of the [genetic algorithm](https://en.wikipedia.org/wiki/Genetic_algorithm). At each *epoch*: - all boolnets are scored w.r.t. how they calculate some bitstring function or its inverse, on the average (the more wrong bits, the less the score), - boolnets with the lowest scores are replaced by inheritants of boolnets with the highest scores. Inheritance, as usually, is [crossover](https://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)) and [mutation](https://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)), in this case for 1) indices of input boolons and 2) boolean functions in the input tree, for each boolon. Note that a single boolnet does not evolve; a population does. The purpose of the crate is to provide controls for this process and make it more interactive. </details> ## Quick Start **Warning**: use `release` profile, without optimizations the execution is 20–30 times slower. In your `code.rs`, bring `BitransformRegister` and `Evolver` structs into scope: ```ignore extern crate boolnetevo; use boolnetevo::{ BitransformRegister, Evolver }; ``` and create their default instances: ```ignore fn app() -> Result<(), String> { let register = BitransformRegister::new_default(); let mut evolver = Evolver::new_default(®ister)?; ... } ``` Now, you can run the built-in shell... ```ignore evolver.shell(®ister)?; ``` and get something like this (with colors) on your terminal: <div style="background: #000000; font-family: mono; font-size: 80%; color: #D0D0D0; padding: 16px;"> <font color="#729FCF"><b>BITRANSFORM:</b></font> InvXor { direct: Xor }<br> <font color="#729FCF"><b>POPULATION:</b></font> 250<br> <font color="#729FCF"><b>ARCHITECTURE:</b></font> <font color="#FCE94F">height</font> = 1, <font color="#FCE94F">size</font> = 48, <font color="#FCE94F">rands</font> = 2, <font color="#FCE94F">layers</font> = 1, <font color="#FCE94F">cycles</font> = 1<br> <font color="#729FCF"><b>PARAMETERS:</b></font> <font color="#FCE94F">batches</font> = 8, <font color="#FCE94F">replace_ratio</font> = 0.7, <font color="#FCE94F">par_ratio</font> = 0.8, <font color="#FCE94F">parents</font> = 2, <font color="#FCE94F">par_sw_prob</font> = 0.9, <font color="#FCE94F">mutat_prob</font> = 0.1<br> <font color="#3465A4"><b>SETTINGS:</b></font> <font color="#C4A000">log</font> <font color="#06989A">off</font>, <font color="#C4A000">test</font> <font color="#06989A">new</font>, <font color="#C4A000">print</font> <font color="#06989A">improve</font><br> <font color="#555753">Type "?" to see the list of available commands...</font><br> $ evol<br> <font color="#555753">Press any key to stop...</font><br> <font color="#AD7FA8"><b>Epoch</b></font> 1 ( 41 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -9.0586<font color="#AD7FA8">, average</font> = -7.9585<font color="#AD7FA8">, max</font> = -6.8008<br> <font color="#AD7FA8"><b>Epoch</b></font> 2 ( 29 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -8.9746<font color="#AD7FA8">, average</font> = -7.7293<font color="#AD7FA8">, max</font> = -6.7021<br> <font color="#AD7FA8"><b>Epoch</b></font> 3 ( 29 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -8.9043<font color="#AD7FA8">, average</font> = -7.5695<font color="#AD7FA8">, max</font> = -6.4512<br> <font color="#AD7FA8"><b>Epoch</b></font> 4 ( 18 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -8.9697<font color="#AD7FA8">, average</font> = -7.4082<font color="#AD7FA8">, max</font> = -5.7324<br> <font color="#AD7FA8"><b>Epoch</b></font> 6 ( 17 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -8.7197<font color="#AD7FA8">, average</font> = -7.1513<font color="#AD7FA8">, max</font> = -5.7129<br> ...<br> <font color="#AD7FA8"><b>Epoch</b></font> 58 ( 29 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -5.0742<font color="#AD7FA8">, average</font> = -1.9552<font color="#AD7FA8">, max</font> = -0.4736<br> <font color="#AD7FA8"><b>Epoch</b></font> 64 ( 16 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -4.5156<font color="#AD7FA8">, average</font> = -1.8883<font color="#AD7FA8">, max</font> = -0.2500<br> <font color="#AD7FA8"><b>Epoch</b></font> 65 ( 17 <font color="#3465A4">ms</font>)<font color="#AD7FA8"><b>:</b></font> <font color="#AD7FA8">min</font> = -5.0068<font color="#AD7FA8">, average</font> = -1.8879<font color="#AD7FA8">, max</font> = 0.0000<br> <br> <font color="#729FCF"><b>BITRANSFORM:</b></font> InvXor { direct: Xor }<br> <font color="#729FCF"><b>POPULATION:</b></font> 250<br> <font color="#729FCF"><b>ARCHITECTURE:</b></font> <font color="#FCE94F">height</font> = 1, <font color="#FCE94F">size</font> = 48, <font color="#FCE94F">rands</font> = 2, <font color="#FCE94F">layers</font> = 1, <font color="#FCE94F">cycles</font> = 1<br> <font color="#729FCF"><b>PARAMETERS:</b></font> <font color="#FCE94F">batches</font> = 8, <font color="#FCE94F">replace_ratio</font> = 0.7, <font color="#FCE94F">par_ratio</font> = 0.8, <font color="#FCE94F">parents</font> = 2, <font color="#FCE94F">par_sw_prob</font> = 0.9, <font color="#FCE94F">mutat_prob</font> = 0.1<br> <font color="#3465A4"><b>SETTINGS:</b></font> <font color="#C4A000">log</font> <font color="#06989A">off</font>, <font color="#C4A000">test</font> <font color="#06989A">new</font>, <font color="#C4A000">print</font> <font color="#06989A">improve</font><br> $ run 0 a7b8<br> e9924e2a </div> <br> (To verify that `0xe992 XOR 0x4e2a = 0xa7b8`, <div style="background: #000000; font-family: mono; font-size: 80%; color: #D0D0D0; padding: 16px;"> $ dir e9924e2a<br> a7b8 </div> <br> or use any calculator in programming mode.) ...*OR* you can call `Evolver`'s methods directly: ```ignore evolver.bitran(®ister, "inv-sha1", &[1, 4])?; // 1 round of SHA-1 with 4-byte message evolver.arch(2, 240, 8, 2, 1)?; evolver.par("mutat_prob", "0.2")?; evolver.evol(100)?; evolver.par("mutat_prob", "0.1")?; evolver.evol(50)?; evolver.save("evolvers/invsha1-demo")?; ``` See `src/bin/shell.rs` and `src/bin/script.rs`. ## Custom Bitransforms The evolution optimizes a population towards precise calculation of some bitstring function. The related methods — constructing the function's parameters block, providing input and output size in bits, scoring the input-output batch pair in regard to the function — are gathered in `Bitransform` trait. The crate implements it for `xor`, `add`, `mult`, [`CRC32`](https://en.wikipedia.org/wiki/Cyclic_redundancy_check#CRC-32_algorithm), [`MD5`](https://en.wikipedia.org/wiki/MD5), [`SHA1`](https://en.wikipedia.org/wiki/SHA-1), [`SHA2`](https://en.wikipedia.org/wiki/SHA-2), [`SHA3/Keccak`](https://en.wikipedia.org/wiki/SHA-3) functions and their inverses, and you can implement it for your own functions. As an example, see `src/bitransform/xor.rs` and `src/bitransform/sha1.rs`. The difference between direct and inverse bitransforms is that in the former case, the *function of input* is compared bitwise with the *output*, while in the latter case, the same *function of output* is compared with the *input*. In other words, you do not need explicit inverse function to score something that tries to perform inversion... actually, "something" should become an inverter. Assume you have done it for `Func` in `func` module, then you add it to register: ```ignore mod func; use func::{Func, InvFunc}; fn app() -> Result<(), String> { let mut register = BitransformRegister::new_default(); register.add(vec![ Func::reg_entry(), InvFunc::reg_entry() ])?; ... } ``` and now you can use it with evolver: ```ignore let mut evolver = Evolver::new_default(®ister)?; evolver.bitran(®ister, "inv-func", &[7, 5, 3])?; evolver.evol(1000)?; ``` (Here `7, 5, 3` are your function's parameters.) ## Modifying the crate Well, copy its source and do whatever you like: extend architecture, apply another genetic algorithm, optimize performance, ... It is something else then, and you should give it a different name. ## Kin Some resembling frameworks, packages, projects: - [Boolean-Circuit-Evolution](https://github.com/JBQuim/Boolean-Circuit-Evolution) - [BoolNet](https://cran.r-project.org/web/packages/BoolNet/) - [PyBoolNet](https://github.com/hklarner/PyBoolNet) - [revonet](https://crates.io/crates/revonet) - [rsgenetic](https://crates.io/crates/rsgenetic) There are probably more... When searching for those, keep in mind that the "evolution" term in this area sometimes means change of a single boolnet state in time rather than optimization of a boolnet population. ## A*s*hievements So far, well... While populations relatively quickly learn to invert simple bitstring functions like bitwise XOR, they struggle with more complex ones, especially those created to prevent inversion, like rounds of cryptographic hashes SHA-x. Even ADD, due to carry, requires more "advanced" boolnet architecture (more layers etc.), which makes learning significantly slower. There seem to be non-zero bounds for the best score, and, of course, there is no guarantee to reach score 0, that is, to breed a boolnet that *always* calculates a function precisely. There may be a *proof* somewhere that this approach has insurmountable limitations when a function's complexity exceeds certain threshold, and then to apply it would be like trying to make a bridge of ashes. On the other hand, it can be a part of some more sophisticated scheme. ## A glitch **Case A.** Set the following evolver: - bitransform is `inv-sha2` with 64 rounds (i.e. full SHA2-256) and message length 60 bytes (480 bits) *OR* `inv-keccak` with hash length 256, 24 rounds (i.e. full SHA3-256), and the same message length; - population is 500; - architecture is `height = 1`, `size = 8000` (yes, most will be unused), `rands = 0`, `layers = 1` (single layer...), `cycles = 1`; - evolution parameters are default except for `batches = 16` (or `batches = 32`) and `mutat_prob = 0.001`; - settings: change `test` to `all` and `print` to `all`. Now, start the evolution, `$ e` in shell, and watch the `average` score column. It remains very close to `-128.0` however many epochs pass, you should never get more than `-127.95` or so (the more the `batches`, the more precision the law of large numbers provides). *This* outcome is quite expectable: not a single network of such simple architecture, — where each output bit is a function of 3 or less hash bits, — is able to invert these *preimage-resistant* cryptographic hashes statistically better than guessing a message (and thus its hash) randomly. The population as a whole shows the absence of improvement as well, of course. **Case B.** Same as Case A with a single difference: decrease message length to 40 bytes (320 bits — notice it is still larger than hash length); do not forget to `reset` the population after. Now, start the evolution again. After 100 or so epochs, the `average` score increases to nearly `-127.0` and remains there. It seemingly begins to improve at 20–30th epoch, when some networks gain (?) a very, very tiny ability to invert hashes better (?) than a random guessing; then their descendants spread in the population. Reset populations and repeat evolutions from Cases A and B several times to verify their stability. Or try it for SHA1, MD5 with similar results. What would that 1 bit (which is >10σ-deviation from mean of approximated normal distribution of the score) mean, were it not the header of this section... Hint: consider the inverter that, given a hash, tries several (even 2) fixed messages and selects one whose hash has maximum number of coinciding bits. */ mod base; mod bitransform; mod bool3funcs; mod boolnet; mod evolearn; mod evolver; mod rwbytes; pub use bitransform::{ Bitransform, BitransformDesc, BitransformRegister }; pub use evolver::Evolver;