# Gonidium
Gonidium is a DSL for symbolic expression pipelines used in deep-learning style math (especially scalar / elementwise kernels).
## Why a DSL?
Deep-learning frameworks often need a small, predictable language to represent and optimise math kernels:
- Parse compact surface syntax into a typed IR.
- Run algebraic simplification / fusion (e-graph).
- Interpret for validation or emit plain Python kernels for integration.
Gonidium focuses on that workflow and intentionally keeps the language small.
## Design Principles (Current)
- Domain-focused: elementwise math + common DL functions (`exp`, `log`, `sigmoid`, `relu`, `floor`, `ceil`, ...).
- Closed-world output: after lowering/optimisation, the result is always expressible using the supported operators/builtins.
- Typed IR first: a `TypedDag` is the semantic contract; surface syntax is just a frontend.
- Explicit annotations + implicit promotion: params are annotated; expressions are inferred via promotion rules.
- Diagnostics are part of the API: stable short codes + configurable severities.
Non-goals (by design):
- General CAS (e.g. symbolic integration), rational exactness, or arbitrary user-defined functions.
- Modules/package system.
## Quick Start
Build and run the CLI:
```bash
cargo build
cargo run -- repl
```
Run a one-off expression (interpreting when inputs are provided):
```bash
`--inputs` is parsed against the declared parameter dtypes, so the input shape should match the
signature. For example:
```bash
If you omit `--inputs`, `run` prints the inferred signature + a simplified expression instead of evaluating.
Python bindings are also available from the same crate via `maturin`:
```bash
uvx maturin develop
uv run python -c "import gonidium; print(gonidium.simplify_expr('|x: f32|\\nexp(x)'))"
```
The Python package exposes a small public facade rather than asking callers to import the native
extension directly:
- `gonidium.__version__`
- `gonidium.GonidiumError`
- `gonidium.simplify_expr(source, optimize=False)` -> simplified single-output expression
- `gonidium.diff(source, variable, optimize=False)` -> simplified symbolic derivative
- `gonidium.emit_python(source, function_name='_kernel', optimize=False)` -> full Python kernel source (`function_name` must be a valid Python identifier)
For Rust, the root crate now exposes the small stable facade plus the main compile pipeline:
- root facade: `parse`, `simplify`, `diff_expr`, `emit_python_expr`, `diff`, `simplify_expr`, `emit_python`
- pipeline: `parse_dsl`, `lower`, `optimize`, `interp_eval`
- expression AST types: `Expr` (spanned root), `ExprNode`, `Span`, `Spanned`
- diagnostics: `Diag`, `DiagCode`, `DiagConfig`, `Severity`
- advanced differentiation internals: `gonidium::experimental::*`
- backend / config / repl helpers live under their modules, e.g. `gonidium::backend`, `gonidium::config`, `gonidium::repl`
The new expression-level facade treats free variables as `f64` parameters by default, so it is aimed at math-kernel style expressions rather than fully annotated DSL functions.
## Language At a Glance
```text
q2 = b + 1
q1 * q2
```
Key syntax notes:
- `expr @ dtype` is postfix cast sugar, equivalent to `cast<dtype>(expr)`.
- `//` is integer floor division; `%` follows floor-division remainder semantics.
- Comparisons do not chain: `a < b < c` is rejected.
- Line comments start with `#`.
Full grammar/spec: `docs/grammar.md`.
## Type System Snapshot
- Supported dtypes: `bool`, `u8/u16/u32/u64`, `i8/i16/i32/i64`, `f16/bf16/f32/f64`, `c64/c128`.
- Default literal typing (when no `@dtype`):
- int: smallest signed int that fits
- float/complex: smallest IEEE type that represents the value exactly (promotes to wider if needed)
- Promotion inserts explicit `Cast` nodes in the typed DAG; each promotion emits an info diagnostic (configurable).
Details: `docs/type-system.md`.
## Diagnostics
Diagnostics have stable short codes (e.g. `L301`) and configurable severities.
- Spec: `docs/diagnostics.md`
- Config file: `gonidium.toml` (project root)
- Config doc: `docs/config.md`
## REPL
Start:
```bash
cargo run -- repl
```
REPL v1 accepts three kinds of single-line input:
- Input declaration: `x: f32`
- Assignment: `t1 = x + 1`
- Expression: `x + 1`
Output policy:
- If it can be interpreted: prints `value: dtype`
- Otherwise: prints `expr: dtype` (optionally simplified)
- Prints diagnostics produced by this line (promotions, precision loss, etc.)
## Architecture Overview
Pipeline (conceptual layers):
```text
source
-> parse (chumsky)
-> AST (untyped)
-> lower (type inference + literal rules + checks)
-> TypedDag
-> const_fold (typed)
-> strip_types (TypedDag -> RecExpr<MathLang> + TypeMap)
-> egg runner (algebraic rewrites + fusion)
-> extractor (FusionCost)
-> restore_types (bottom-up)
-> TypedDag (optimised)
-> backend: interpret / codegen
```
- TypedDag layer: typing, checking, const-folding, explicit-vs-implicit cast marking.
- Opt (e-graph) layer: type-erased algebraic optimisation and fusion selection.
## Embedding (Kernel Composition)
If you embed Gonidium as a kernel IR in another framework:
1. `parse_dsl` to `FuncDef`
2. Compose graphs with `compose` / `compose_with_diags`
3. Optionally normalise parameter names with `rename_params`
4. `lower` -> `optimize` -> your own `Backend` implementation (or the default `PythonBackend`) or `interp_eval`
`compose_with_diags` emits `C201 (compose-symbol-reuse)` when a symbol name appears on both sides and is treated as the same variable.
## Project Layout
```text
docs/ language + diagnostic + config docs
src/parse/ lexer + chumsky parser
src/ir/ TypedDag + dtypes + lowering
src/opt/ e-graph language + rewrite rules + extraction + roundtrip
src/backend/ interpreter + codegen backends
tests/ parser/lower/opt/backend integration tests
```
## Status / Roadmap
Implemented (v0.0.1):
- Parser + grammar-driven precedence
- Lowering: literal typing, promotion + explicit Cast insertion, range/precision checks
- Optimisation: const-fold + e-graph rewrites + fusion cost extraction
- Backends: interpreter, Python codegen
- REPL + configurable diagnostics
Planned directions (non-binding):
- More builtins and rewrite coverage (focus: DL kernels)
- Better backend docs and stability guarantees for embedding
- Improved tooling (format/check, richer trace/visualisation)