epsilon-engine 0.1.0

CSSR ε-machine: statistical complexity S_T and entropy rate H_T from data streams.
Documentation
  • Coverage
  • 95%
    57 out of 60 items documented3 out of 32 items with examples
  • Size
  • Source code size: 94.29 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 5.22 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 1m 38s Average build duration of successful builds.
  • all releases: 1m 38s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • kwailapt/aevum
    1 0 5
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • kwailapt

epsilon-engine

CSSR ε-machine: statistical complexity S_T and entropy rate H_T from arbitrary data streams.

Part of the Aevum physics kernel.

Crates.io docs.rs Apache-2.0

What It Is

An implementation of Causal State Splitting Reconstruction (CSSR) — the algorithm that infers a minimal ε-machine from a data stream and computes:

  • S_T (statistical complexity, C_μ) — the amount of causal structure in the data
  • H_T (entropy rate, h_μ) — residual unpredictability given the causal states

S_T ≠ Shannon entropy H(X). Shannon entropy measures static symbol distributions. S_T measures the minimal memory required to predict future symbols — a fundamentally different quantity.

S_T = 0 → pure noise (no structure)
S_T → ∞ → maximal causal structure
H_T = 0 → perfectly predictable given causal states
H_T → log(A) → maximally random

The aevum_filter Use Case

Aevum uses epsilon-engine to filter AI agent responses: chunks with S_T below a threshold are dropped (repetitive/padded content), chunks above are kept (causal structure). Real benchmark results:

Input Compression Notes
Pure repetition 100% Correctly: zero causal structure
Padded LLM response 36–68% Pleasantries dropped, insight kept
Dense technical text 0% Correctly: every sentence carries structure

Usage

use epsilon_engine::{infer, Config, symbolize::WordSymbolizer};

// Option 1: Word-level symbolization (recommended for text)
let symbolizer = WordSymbolizer::new(4);
let symbols = symbolizer.symbolize("your text here").unwrap();

// Option 2: Numeric data (equal-frequency binning)
let data = vec![0.1, 0.5, 0.3, 0.9, 0.2];
let symbols = epsilon_engine::symbolize::equal_frequency(&data, 4).unwrap();

// Infer ε-machine
let cfg = Config { max_depth: 3, ..Config::default() };
let result = infer(&symbols, cfg);

println!("S_T = {:.4}", result.cognitive_split.statistical_complexity.point);
println!("H_T = {:.4}", result.cognitive_split.entropy_rate.point);

Symbolization Strategies

Strategy Input Use Case
WordSymbolizer Text Semantic redundancy detection (e.g., "Home About Blog Contact" → low S_T)
equal_frequency Numeric stream Unknown distribution (default for time-series)
equal_width Numeric stream Known bounded range

Algorithm Reference

Cosma Rohilla Shalizi and James P. Crutchfield, "Computational Mechanics: Pattern and Prediction, Structure and Simplicity," Journal of Statistical Physics 104 (2001): 817–879.

License

Apache-2.0 — see LICENSE.