gonidium 0.0.2 - Docs.rs

# Gonidium

Gonidium is a DSL for symbolic expression pipelines used in deep-learning style math (especially scalar / elementwise kernels).

## Why a DSL?

Deep-learning frameworks often need a small, predictable language to represent and optimise math kernels:

- Parse compact surface syntax into a typed IR.
- Run algebraic simplification / fusion (e-graph).
- Interpret for validation or emit plain Python kernels for integration.

Gonidium focuses on that workflow and intentionally keeps the language small.

## Design Principles (Current)

- Domain-focused: elementwise math + common DL functions (`exp`, `log`, `sigmoid`, `relu`, `floor`, `ceil`, ...).
- Closed-world output: after lowering/optimisation, the result is always expressible using the supported operators/builtins.
- Typed IR first: a `TypedDag` is the semantic contract; surface syntax is just a frontend.
- Explicit annotations + implicit promotion: params are annotated; expressions are inferred via promotion rules.
- Diagnostics are part of the API: stable short codes + configurable severities.

Non-goals (by design):

- General CAS (e.g. symbolic integration), rational exactness, or arbitrary user-defined functions.
- Modules/package system.

## Quick Start

Build and run the CLI:

```bash
cargo build
cargo run -- repl
```

Run a one-off expression (interpreting when inputs are provided):

```bash
cargo run -- run -e "|x: f32| x + 1" --inputs "1"
```

`--inputs` is parsed against the declared parameter dtypes, so the input shape should match the
signature. For example:

```bash
cargo run -- run -e "|z: c128| z" --inputs "1+2j"
```

If you omit `--inputs`, `run` prints the inferred signature + a simplified expression instead of evaluating.

Python bindings are also available from the same crate via `maturin`:

```bash
uvx maturin develop
uv run python -c "import gonidium; print(gonidium.simplify_expr('|x: f32|\\nexp(x)'))"
```

The Python package exposes a small public facade rather than asking callers to import the native
extension directly:

- `gonidium.__version__`
- `gonidium.GonidiumError`
- `gonidium.simplify_expr(source, optimize=False)` -> simplified single-output expression
- `gonidium.diff(source, variable, optimize=False)` -> simplified symbolic derivative
- `gonidium.emit_python(source, function_name='_kernel', optimize=False)` -> full Python kernel source (`function_name` must be a valid Python identifier)

For Rust, the root crate now exposes the small stable facade plus the main compile pipeline:

- root facade: `parse`, `simplify`, `diff_expr`, `emit_python_expr`, `diff`, `simplify_expr`, `emit_python`
- pipeline: `parse_dsl`, `lower`, `optimize`, `interp_eval`
- expression AST types: `Expr` (spanned root), `ExprNode`, `Span`, `Spanned`
- diagnostics: `Diag`, `DiagCode`, `DiagConfig`, `Severity`
- advanced differentiation internals: `gonidium::experimental::*`
- backend / config / repl helpers live under their modules, e.g. `gonidium::backend`, `gonidium::config`, `gonidium::repl`

The new expression-level facade treats free variables as `f64` parameters by default, so it is aimed at math-kernel style expressions rather than fully annotated DSL functions.

## Language At a Glance

```text
|a: f32, b: f64|
q1 = a + 1
q2 = b + 1
q1 * q2

|x: f32|
if x > 0.0 then x else 0.0

|x: f32, w: f32, b: f32|
sigmoid(x * w + b)
```

Key syntax notes:

- `expr @ dtype` is postfix cast sugar, equivalent to `cast<dtype>(expr)`.
- `//` is integer floor division; `%` follows floor-division remainder semantics.
- Comparisons do not chain: `a < b < c` is rejected.
- Line comments start with `#`.

Full grammar/spec: `docs/grammar.md`.

## Type System Snapshot

- Supported dtypes: `bool`, `u8/u16/u32/u64`, `i8/i16/i32/i64`, `f16/bf16/f32/f64`, `c64/c128`.
- Default literal typing (when no `@dtype`):
  - int: smallest signed int that fits
  - float/complex: smallest IEEE type that represents the value exactly (promotes to wider if needed)
- Promotion inserts explicit `Cast` nodes in the typed DAG; each promotion emits an info diagnostic (configurable).

Details: `docs/type-system.md`.

## Diagnostics

Diagnostics have stable short codes (e.g. `L301`) and configurable severities.

- Spec: `docs/diagnostics.md`
- Config file: `gonidium.toml` (project root)
- Config doc: `docs/config.md`

## REPL

Start:

```bash
cargo run -- repl
```

REPL v1 accepts three kinds of single-line input:

- Input declaration: `x: f32`
- Assignment: `t1 = x + 1`
- Expression: `x + 1`

Output policy:

- If it can be interpreted: prints `value: dtype`
- Otherwise: prints `expr: dtype` (optionally simplified)
- Prints diagnostics produced by this line (promotions, precision loss, etc.)

## Architecture Overview

Pipeline (conceptual layers):

```text
source
  -> parse (chumsky)
  -> AST (untyped)
  -> lower (type inference + literal rules + checks)
  -> TypedDag
  -> const_fold (typed)
  -> strip_types (TypedDag -> RecExpr<MathLang> + TypeMap)
  -> egg runner (algebraic rewrites + fusion)
  -> extractor (FusionCost)
  -> restore_types (bottom-up)
  -> TypedDag (optimised)
  -> backend: interpret / codegen
```

- TypedDag layer: typing, checking, const-folding, explicit-vs-implicit cast marking.
- Opt (e-graph) layer: type-erased algebraic optimisation and fusion selection.

## Embedding (Kernel Composition)

If you embed Gonidium as a kernel IR in another framework:

1. `parse_dsl` to `FuncDef`
2. Compose graphs with `compose` / `compose_with_diags`
3. Optionally normalise parameter names with `rename_params`
4. `lower` -> `optimize` -> your own `Backend` implementation (or the default `PythonBackend`) or `interp_eval`

`compose_with_diags` emits `C201 (compose-symbol-reuse)` when a symbol name appears on both sides and is treated as the same variable.

## Project Layout

```text
docs/       language + diagnostic + config docs
src/parse/  lexer + chumsky parser
src/ir/     TypedDag + dtypes + lowering
src/opt/    e-graph language + rewrite rules + extraction + roundtrip
src/backend/ interpreter + codegen backends
tests/      parser/lower/opt/backend integration tests
```

## Status / Roadmap

Implemented (v0.0.1):

- Parser + grammar-driven precedence
- Lowering: literal typing, promotion + explicit Cast insertion, range/precision checks
- Optimisation: const-fold + e-graph rewrites + fusion cost extraction
- Backends: interpreter, Python codegen
- REPL + configurable diagnostics

Planned directions (non-binding):

- More builtins and rewrite coverage (focus: DL kernels)
- Better backend docs and stability guarantees for embedding
- Improved tooling (format/check, richer trace/visualisation)