symbios 1.4.0

A derivation engine for L-Systems (ABOP compliant).
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
# Symbios Performance Guide

This guide provides benchmarking results, optimization tips, and performance characteristics for Symbios.

## Table of Contents

1. [Benchmark Results]#benchmark-results
2. [Performance Characteristics]#performance-characteristics
3. [Optimization Tips]#optimization-tips
4. [Profiling Guide]#profiling-guide
5. [Common Bottlenecks]#common-bottlenecks
6. [Genetic Operator Performance]#genetic-operator-performance

---

## Benchmark Results

All benchmarks run on the standard test suite using Criterion.rs.

### Exponential Growth (Algae)

**Test:** Classic L-System doubling pattern
```
A → A B
B → A
Axiom: A
Depth: 15 steps
Result: 987 modules (Fibonacci sequence)
```

**Performance:**
- **Time:** ~79 µs (microseconds)
- **Throughput:** ~12,500 derivations/second
- **Per-module cost:** ~80 ns/module

**What this tests:**
- State allocation and module pushing
- Symbol matching speed
- Memory layout efficiency

### Context-Sensitive Matching

**Test:** Signal propagation with left/right context
```
A(x) < B(y) > C(z) → B(x+y+z)
Axiom: [A(1) B(1) C(1)] × 1000 = 3,000 modules
Depth: 1 step
```

**Performance:**
- **Time:** ~169 µs (microseconds)
- **Throughput:** ~17.7 million modules/second (3000 modules / 169µs)
- **Per-module cost:** ~56 ns/module

**What this tests:**
- Context matching algorithm with topology skip-links
- Parameter access speed (SoA layout)
- VM expression evaluation overhead
- Zero-allocation matching via `MatchScratch`

### Benchmark Summary

| Operation | Time | Modules | ns/module |
|-----------|------|---------|-----------|
| Algae Growth (15 steps) | 79 µs | 987 | 80 |
| Context Matching (1 step) | 169 µs | 3,000 | 56 |

---

## Performance Characteristics

### Algorithmic Complexity

| Operation | Time Complexity | Space Complexity | Notes |
|-----------|----------------|------------------|-------|
| Add rule | O(P) | O(P) | P = rule string length |
| Set axiom | O(A) | O(A) | A = axiom string length |
| Derive (N steps) | O(N × M × R) | O(M) | M = modules, R = rules per symbol |
| Context matching | O(1) | O(1) | Via skip-links |
| Parameter access | O(1) | O(1) | SoA flat indexing |
| Symbol comparison | O(1) | - | u16 equality |
| Expression eval | O(I) | O(S) | I = instructions, S = stack depth |
| Topology calculation | O(M) | O(D) | D = max nesting depth |
| Stochastic selection | O(R) | O(1) | R = matching rules for module |

### Memory Layout Efficiency

**Per-Module Overhead:**
```rust
// ModuleData size: 24 bytes (with alignment padding for f64)
// Enforced by compile-time assertion in src/core/mod.rs.
struct ModuleData {
    symbol: u16,        // 2 bytes
    birth_time: f64,    // 8 bytes (requires 8-byte alignment)
    param_start: u32,   // 4 bytes
    param_len: u16,     // 2 bytes
    topology_link: u32, // 4 bytes
}
// Raw sum: 20 bytes. Struct alignment (8) pads to 24 bytes.
```

**Parameters:** 8 bytes per parameter (f64)

**Example:** Module `A(1.0, 2.0, 3.0)` costs:
- ModuleData: 24 bytes
- Parameters: 3 × 8 = 24 bytes
- **Total: 48 bytes**

Compare to naive approach:
```rust
struct NaiveModule {
    symbol: String,        // 24 bytes (ptr + len + cap)
    birth_time: f64,       // 8 bytes
    parameters: Vec<f64>,  // 24 bytes
    topology_link: Option<usize>, // 16 bytes (Option + usize)
    // Padding: ~8 bytes
}
// Total: ~80 bytes + heap allocation overhead
```

**SoA saves 50% memory per module.**

---

## Optimization Tips

### 1. Pre-allocate State Capacity

If you know your L-System will grow to ~N modules, set the capacity:

```rust
let mut sys = System::new();
sys.max_capacity = 1_000_000;  // Adjust as needed
```

**Why:** The `max_capacity` both limits growth and signals expected size. Reduces reallocation during derivation.

### 2. Minimize Rule Count

Each module is tested against every rule for its symbol on every derivation step.

```rust
// Consolidate where possible. Fewer rules = fewer comparisons.
sys.add_rule("A(x) : x < 5 -> B(x)")?;
sys.add_rule("A(x) : x >= 5 -> C(x)")?;
```

### 3. Simplify Expressions

Complex expressions are compiled once but evaluated on every match.

**Slow:**
```rust
sys.add_rule("A(x) -> B(sin(x) + cos(x) + tan(x))")?;
```

**Fast:**
```rust
// Precompute if x is constant across derivations
sys.add_rule("A(x) -> B(x * 1.732)")?;  // Approximation
```

### 4. Avoid Deep Nesting

Topology calculation is O(N) but uses a stack. Deep nesting increases overhead.

**Slow:**
```
Axiom: A [ [ [ [ B ] ] ] ]  // 4 levels deep
```

**Fast:**
```
Axiom: A [ B ] [ C ] [ D ]  // 1 level, parallel branches
```

**Limit:** Max nesting is 4,096 (hard limit to prevent DoS).

### 5. Use Constants Instead of Literals

Constants defined with `#define` are inlined at compile time, with no extra cost. They also enable genetic algorithm coupling (constant mutation affects all references).

```rust
sys.add_directive("#define ANGLE 45")?;
sys.add_rule("A -> [ +(ANGLE) B ] [ -(ANGLE) C ]")?;
```

### 6. Batch Derivations

Instead of:
```rust
for i in 0..100 {
    sys.derive(1)?;
    // Do something with state
}
```

Do:
```rust
sys.derive(100)?;
// Process final state
```

**Why:** Reduces per-step setup overhead (topology recalculation, buffer reset).

### 7. Seed RNG for Consistent Benchmarking

```rust
sys.set_seed(42);
```

Eliminates variance from stochastic rule selection in benchmarks.

---

## Profiling Guide

### Running Benchmarks

```bash
cargo bench
```

Output will be in `target/criterion/`.

### Benchmark Groups

The benchmark suite (`benches/bench_main.rs`) covers six groups:

| Group | What it Measures |
|-------|------------------|
| `Derivation` | Core derive loop: allocation, matching, state push |
| `Context Matching` | Skip-link navigation, parameter loading, VM eval |
| `Genetic/Mutation` | Probability and constant perturbation (10-100 rules) |
| `Genetic/Crossover` | Uniform crossover with constant blending (10-50 rules) |
| `Genetic/StructuralMutation` | Module insert/delete/swap, bytecode perturbation |
| `Genetic/CombinedEvolution` | Full cycle: crossover + mutation + structural mutation |

### Custom Benchmarks

Add to `benches/bench_main.rs`:

```rust
fn bench_my_case(c: &mut Criterion) {
    c.bench_function("My Case", |b| {
        b.iter(|| {
            let mut sys = System::new();
            sys.add_rule("...").unwrap();
            sys.set_axiom("...").unwrap();
            sys.derive(black_box(10)).unwrap();
        })
    });
}
```

### Profiling with `perf` (Linux)

```bash
cargo build --release
perf record --call-graph=dwarf target/release/symbios
perf report
```

### Flamegraph

```bash
cargo install flamegraph
cargo flamegraph --bench bench_main
```

---

## Common Bottlenecks

### 1. State Export

**Problem:**
```rust
let output = sys.state.display(&sys.interner).to_string();  // Slow for large states
```

**Why:** Allocates a new `String` and formats every module.

**Solution:** Only export when needed. Use `sys.state.len()` for progress tracking.

### 2. Topology Calculation

**Problem:**
```rust
sys.derive(1)?;  // Recalculates topology each step if brackets are present
```

**Why:** Symbios recalculates skip-links after each derivation step when `[` and `]` are present.

**Solution:** This is unavoidable for branching L-Systems, but the cost is O(N) which is acceptable.

### 3. Rule Compilation

**Problem:**
```rust
for _ in 0..1000 {
    sys.add_rule("A(x) -> B(x + sin(x))")?;  // Repeated compilation
}
```

**Why:** Each `add_rule` parses and compiles the expression.

**Solution:** Add rules once, reuse the system. For evolutionary workloads, use `from_source()` which parses all rules in one pass.

### 4. Excessive Derivations

**Problem:**
```rust
sys.derive(100)?;  // Exponential growth
// State explodes to millions of modules
```

**Why:** L-Systems can grow exponentially (e.g., `A → AA` doubles every step).

**Solution:**
- Set `sys.max_capacity` to catch runaway growth
- Monitor `sys.state.len()` between derivations
- Use pruning rules to limit growth

```rust
sys.add_rule("A(x) : x < 100 -> A(x + 1)")?;  // Growth limit
```

### 5. Source Round-Tripping Overhead

**Problem:** Evolutionary loops using `SourceGenotype` pay parse + compile + decompile cost per generation.

**Why:** Each `mutate_with_rng` call on `SourceGenotype` parses source → builds `System` → applies operator → calls `to_source()`.

**Solution:** For tight loops where source preservation isn't needed, operate directly on `System` objects and only call `to_source()` for serialization/logging.

---

## Genetic Operator Performance

### Relative Cost

| Operator | Relative Cost | Scales With |
|----------|--------------|-------------|
| Probability mutation | Very low | Number of rules |
| Constant mutation | Very low | Number of constants |
| Gaussian jitter | Low | Total bytecode instructions |
| Operator flip | Low | Total bytecode instructions |
| Structural mutation | Medium | Rules × successors per rule |
| Rule duplication | Medium | Number of rules |
| Topological mutation | Low | Number of successor symbols |
| Literal promotion | Medium | Bytecode instructions × constants |
| Crossover (uniform) | Medium | Total rules across parents |
| Crossover (homologous) | Medium | Shared predecessor symbols |
| BLX-alpha blending | Medium | Shared rule pairs with matching structure |
| Sub-expression grafting | Medium | Successor sequence length |

### Evolutionary Loop Pattern

```rust
// Efficient: operate on System objects directly
let mut population: Vec<System> = /* ... */;
let crossover_config = CrossoverConfig::default();
let mutation_config = MutationConfig::default();

for generation in 0..1000 {
    // Select parents, crossover, mutate
    let mut offspring = population[0].crossover(&population[1], &crossover_config)?;
    offspring.mutate(&mutation_config);

    // Evaluate fitness (derive + measure)
    offspring.set_axiom("A(0)")?;
    offspring.derive(10)?;
    let fitness = offspring.state.len();  // Example fitness

    // Only serialize when needed (e.g., checkpointing)
    if generation % 100 == 0 {
        let source = offspring.to_source();
        // Save to file...
    }
}
```

---

## Optimization Checklist

Before deploying Symbios in production, verify:

- [ ] Rules are minimal and non-redundant
- [ ] Expressions are simplified where possible
- [ ] State capacity is set appropriately (`sys.max_capacity`)
- [ ] Derivation depth is bounded
- [ ] Stochastic rules use `set_seed` for reproducibility in testing
- [ ] Benchmarks pass performance regression tests (`cargo bench`)
- [ ] Memory usage is profiled for your use case
- [ ] Evolutionary loops operate on `System` objects (not `SourceGenotype`) in hot paths

---

## Expected Performance Ranges

Based on benchmarks and real-world usage:

| Use Case | Modules | Derivations/sec | Notes |
|----------|---------|-----------------|-------|
| Small (< 100 modules) | 10-100 | 100,000+ | Interactive rates |
| Medium (1K modules) | 1,000 | 10,000+ | Real-time generation |
| Large (10K modules) | 10,000 | 1,000+ | Batch processing |
| Huge (100K+ modules) | 100,000+ | 100+ | Precompute offline |

**Hardware:** Apple M1/M2, AMD Ryzen 5000+, Intel i5/i7 (modern CPUs).

---

## References

- [Criterion.rs Benchmarking]https://github.com/bheisler/criterion.rs
- [Rust Performance Book]https://nnethercote.github.io/perf-book/
- [Structure-of-Arrays Benefits]https://en.wikipedia.org/wiki/AoS_and_SoA