# epsilon-engine
**CSSR ε-machine: statistical complexity S_T and entropy rate H_T from arbitrary data streams.**
Part of the [Aevum](https://github.com/kwailapt/aevum) physics kernel.
[](https://crates.io/crates/epsilon-engine)
[](https://docs.rs/epsilon-engine)
[](../../LICENSE)
## What It Is
An implementation of **Causal State Splitting Reconstruction (CSSR)** — the algorithm that infers a minimal ε-machine from a data stream and computes:
- **S_T** (statistical complexity, C_μ) — the amount of causal structure in the data
- **H_T** (entropy rate, h_μ) — residual unpredictability given the causal states
**S_T ≠ Shannon entropy H(X).** Shannon entropy measures static symbol distributions. S_T measures the minimal memory required to predict future symbols — a fundamentally different quantity.
```
S_T = 0 → pure noise (no structure)
S_T → ∞ → maximal causal structure
H_T = 0 → perfectly predictable given causal states
H_T → log(A) → maximally random
```
## The `aevum_filter` Use Case
Aevum uses `epsilon-engine` to filter AI agent responses: chunks with S_T below a threshold are dropped (repetitive/padded content), chunks above are kept (causal structure). Real benchmark results:
| Pure repetition | 100% | Correctly: zero causal structure |
| Padded LLM response | 36–68% | Pleasantries dropped, insight kept |
| Dense technical text | 0% | Correctly: every sentence carries structure |
## Usage
```rust
use epsilon_engine::{infer, Config, symbolize::WordSymbolizer};
// Option 1: Word-level symbolization (recommended for text)
let symbolizer = WordSymbolizer::new(4);
let symbols = symbolizer.symbolize("your text here").unwrap();
// Option 2: Numeric data (equal-frequency binning)
let data = vec![0.1, 0.5, 0.3, 0.9, 0.2];
let symbols = epsilon_engine::symbolize::equal_frequency(&data, 4).unwrap();
// Infer ε-machine
let cfg = Config { max_depth: 3, ..Config::default() };
let result = infer(&symbols, cfg);
println!("S_T = {:.4}", result.cognitive_split.statistical_complexity.point);
println!("H_T = {:.4}", result.cognitive_split.entropy_rate.point);
```
## Symbolization Strategies
| `WordSymbolizer` | Text | Semantic redundancy detection (e.g., "Home About Blog Contact" → low S_T) |
| `equal_frequency` | Numeric stream | Unknown distribution (default for time-series) |
| `equal_width` | Numeric stream | Known bounded range |
## Algorithm Reference
Cosma Rohilla Shalizi and James P. Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity," *Journal of Statistical Physics* 104 (2001): 817–879.
## License
Apache-2.0 — see [LICENSE](../../LICENSE).