ries 1.1.1 - Docs.rs

# RIES-RS Search Model

This document defines the core computational model implemented by `ries-rs`.

Goal: make the search behavior explicit, reproducible, and reviewable.

## Scope

This is the model for the core RIES-style equation search engine:

- expression representation (postfix / RPN)
- well-formedness rules
- complexity scoring
- match ranking
- deterministic ordering

This document does not define:

- report-mode category heuristics (`docs/COMPLEXITY.md` + `src/report.rs`, `src/metrics.rs`)
- PSLQ mode
- future experimental search branches

## Expression Representation

Expressions are stored in postfix notation (Reverse Polish Notation) as a sequence of symbols.

- Implementation type: `Expression` in `src/expr.rs`
- Symbol definitions and default weights: `src/symbol.rs`
- Runtime symbol-table overrides (weights/names): `src/symbol_table.rs`

Examples:

- `52/` -> `5/2`
- `xT` -> `tanpi(x)`
- `xCr` -> `1/cospi(x)`

## Symbol Classes (Arity)

Each symbol has a fixed arity category (`Seft` in code):

- `A`: atom / constant / variable (`0`-ary push)
- `B`: unary operator (`1`-input, `1`-output)
- `C`: binary operator (`2`-input, `1`-output)

The search model is defined over the active symbol set after applying CLI/profile filters (`-S`, `-N`, presets, user constants/functions, etc.).

## Well-Formedness Rule (Postfix Grammar)

`ries-rs` generates and evaluates only well-formed postfix expressions. Operationally:

1. Start with stack depth `d = 0`.
2. For each symbol:
   - atom (`A`): `d := d + 1`
   - unary (`B`): requires `d >= 1`, then `d := d`
   - binary (`C`): requires `d >= 2`, then `d := d - 1`
3. A complete expression is valid iff final depth is exactly `1`.

Equivalent abstract grammar (stack-discipline form):

- atoms are terminals
- unary nodes apply to one valid subexpression
- binary nodes apply to two valid subexpressions

In infix form, parentheses are added according to precedence/associativity rules implemented in `src/expr.rs` (power is right-associative; multiplication/division bind tighter than addition/subtraction).

## Search Space Partition

RIES-style search builds equations by matching:

- LHS expressions containing `x`
- RHS expressions that are constant-only

The engine generates candidate expressions up to complexity limits, evaluates them numerically, and then tests LHS/RHS pairs with Newton refinement (unless disabled).

Search implementations:

- sequential: `search_with_stats_and_config`
- parallel: `search_parallel_with_stats_and_config`
- streaming: `search_streaming_with_config`
- one-sided: `search_one_sided_with_stats_and_config`

## Complexity Metric (Core Engine)

Core complexity is additive over active symbol weights.

For an expression `e = (s_1, ..., s_n)`:

`C_expr(e) = sum_i w(s_i)`

Where:

- `w(s)` is the active weight for symbol `s`
- by default, weights come from `Symbol::weight()` (`src/symbol.rs`)
- when overridden by profiles/CLI, generation uses the active `SymbolTable`

For a match (equation) `lhs = rhs`:

`C_match(lhs, rhs) = C_expr(lhs) + C_expr(rhs)`

Important note:

- The core ranking metric does **not** add explicit tree-depth penalties.
- Tree depth and operator count may be reported as metadata (for JSON/reporting), but they are not part of `C_match`.

See `docs/COMPLEXITY.md` for the default weight table and rationale.

## Search Level to Complexity Limits

The CLI and library APIs use different level mappings.

### CLI (`src/cli/runtime.rs::cli_level_to_complexity`)

For CLI `-l/--level = L`:

- `max_lhs_complexity = 35 + 10*L`
- `max_rhs_complexity = 35 + 10*L`

This is calibrated for practical coverage while avoiding expression explosion.

### Library helper (`src/search.rs::level_to_complexity`)

Programmatic API helper uses a lighter mapping:

- `max_lhs = 10 + 4*L`
- `max_rhs = 12 + 4*L`

This difference is intentional and should not be conflated in benchmark or reproducibility claims.

## Match Ranking (Ordering)

The canonical comparator is `compare_matches` in `src/pool.rs`.

Ordering is:

1. Exactness (exact before non-exact)
2. Absolute error (`|x_value - target|`, smaller first)
3. Ranking-mode tie-break:
   - `complexity`: lower `C_match` first
   - `parity`: lower legacy parity score first, then lower `C_match`
4. Lexicographic postfix order of LHS symbols
5. Lexicographic postfix order of RHS symbols

This final lexical tie-break is what makes total ordering explicit and stable.

## Determinism Contract

`--deterministic` exists for reproducibility-oriented runs.

In deterministic mode, `ries-rs`:

- disables parallel search execution
- applies stable final sorting with the canonical comparator

This yields stable output ordering for identical:

- target
- search level / complexity limits
- ranking mode
- symbol set / weights / profile configuration
- build + feature set

Practical caveat:

- floating-point behavior is deterministic within a given build/runtime environment, but results should still be treated as build-config specific unless a manifest is recorded (`--emit-manifest`).

## Reproducible CLI Output Modes

For automation and archival:

- `--json`: structured stdout results + search statistics
- `--emit-manifest FILE`: full run manifest JSON for replay/audit metadata

Recommended reproducibility workflow:

```bash
ries-rs 3.141592653589793 --deterministic --json --emit-manifest run-manifest.json
```

## Source of Truth

If this document and code diverge, code wins.

Authoritative implementation points:

- `src/expr.rs` (representation + infix formatting semantics)
- `src/symbol.rs` / `src/symbol_table.rs` (symbol set + weights)
- `src/search.rs` (search execution)
- `src/pool.rs` (ranking + dedupe + ordering)