epsilon-engine
CSSR ε-machine: statistical complexity S_T and entropy rate H_T from arbitrary data streams.
Part of the Aevum physics kernel.
What It Is
An implementation of Causal State Splitting Reconstruction (CSSR) — the algorithm that infers a minimal ε-machine from a data stream and computes:
- S_T (statistical complexity, C_μ) — the amount of causal structure in the data
- H_T (entropy rate, h_μ) — residual unpredictability given the causal states
S_T ≠ Shannon entropy H(X). Shannon entropy measures static symbol distributions. S_T measures the minimal memory required to predict future symbols — a fundamentally different quantity.
S_T = 0 → pure noise (no structure)
S_T → ∞ → maximal causal structure
H_T = 0 → perfectly predictable given causal states
H_T → log(A) → maximally random
The aevum_filter Use Case
Aevum uses epsilon-engine to filter AI agent responses: chunks with S_T below a threshold are dropped (repetitive/padded content), chunks above are kept (causal structure). Real benchmark results:
| Input | Compression | Notes |
|---|---|---|
| Pure repetition | 100% | Correctly: zero causal structure |
| Padded LLM response | 36–68% | Pleasantries dropped, insight kept |
| Dense technical text | 0% | Correctly: every sentence carries structure |
Usage
use ;
// Option 1: Word-level symbolization (recommended for text)
let symbolizer = new;
let symbols = symbolizer.symbolize.unwrap;
// Option 2: Numeric data (equal-frequency binning)
let data = vec!;
let symbols = equal_frequency.unwrap;
// Infer ε-machine
let cfg = Config ;
let result = infer;
println!;
println!;
Symbolization Strategies
| Strategy | Input | Use Case |
|---|---|---|
WordSymbolizer |
Text | Semantic redundancy detection (e.g., "Home About Blog Contact" → low S_T) |
equal_frequency |
Numeric stream | Unknown distribution (default for time-series) |
equal_width |
Numeric stream | Known bounded range |
Algorithm Reference
Cosma Rohilla Shalizi and James P. Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity," Journal of Statistical Physics 104 (2001): 817–879.
License
Apache-2.0 — see LICENSE.