gonidium 0.0.2

A DSL compiler and REPL for typed numeric expression pipelines
Documentation

Gonidium

Gonidium is a DSL for symbolic expression pipelines used in deep-learning style math (especially scalar / elementwise kernels).

Why a DSL?

Deep-learning frameworks often need a small, predictable language to represent and optimise math kernels:

  • Parse compact surface syntax into a typed IR.
  • Run algebraic simplification / fusion (e-graph).
  • Interpret for validation or emit plain Python kernels for integration.

Gonidium focuses on that workflow and intentionally keeps the language small.

Design Principles (Current)

  • Domain-focused: elementwise math + common DL functions (exp, log, sigmoid, relu, floor, ceil, ...).
  • Closed-world output: after lowering/optimisation, the result is always expressible using the supported operators/builtins.
  • Typed IR first: a TypedDag is the semantic contract; surface syntax is just a frontend.
  • Explicit annotations + implicit promotion: params are annotated; expressions are inferred via promotion rules.
  • Diagnostics are part of the API: stable short codes + configurable severities.

Non-goals (by design):

  • General CAS (e.g. symbolic integration), rational exactness, or arbitrary user-defined functions.
  • Modules/package system.

Quick Start

Build and run the CLI:

cargo build
cargo run -- repl

Run a one-off expression (interpreting when inputs are provided):

cargo run -- run -e "|x: f32| x + 1" --inputs "1"

--inputs is parsed against the declared parameter dtypes, so the input shape should match the signature. For example:

cargo run -- run -e "|z: c128| z" --inputs "1+2j"

If you omit --inputs, run prints the inferred signature + a simplified expression instead of evaluating.

Python bindings are also available from the same crate via maturin:

uvx maturin develop
uv run python -c "import gonidium; print(gonidium.simplify_expr('|x: f32|\\nexp(x)'))"

The Python package exposes a small public facade rather than asking callers to import the native extension directly:

  • gonidium.__version__
  • gonidium.GonidiumError
  • gonidium.simplify_expr(source, optimize=False) -> simplified single-output expression
  • gonidium.diff(source, variable, optimize=False) -> simplified symbolic derivative
  • gonidium.emit_python(source, function_name='_kernel', optimize=False) -> full Python kernel source (function_name must be a valid Python identifier)

For Rust, the root crate now exposes the small stable facade plus the main compile pipeline:

  • root facade: parse, simplify, diff_expr, emit_python_expr, diff, simplify_expr, emit_python
  • pipeline: parse_dsl, lower, optimize, interp_eval
  • expression AST types: Expr (spanned root), ExprNode, Span, Spanned
  • diagnostics: Diag, DiagCode, DiagConfig, Severity
  • advanced differentiation internals: gonidium::experimental::*
  • backend / config / repl helpers live under their modules, e.g. gonidium::backend, gonidium::config, gonidium::repl

The new expression-level facade treats free variables as f64 parameters by default, so it is aimed at math-kernel style expressions rather than fully annotated DSL functions.

Language At a Glance

|a: f32, b: f64|
q1 = a + 1
q2 = b + 1
q1 * q2

|x: f32|
if x > 0.0 then x else 0.0

|x: f32, w: f32, b: f32|
sigmoid(x * w + b)

Key syntax notes:

  • expr @ dtype is postfix cast sugar, equivalent to cast<dtype>(expr).
  • // is integer floor division; % follows floor-division remainder semantics.
  • Comparisons do not chain: a < b < c is rejected.
  • Line comments start with #.

Full grammar/spec: docs/grammar.md.

Type System Snapshot

  • Supported dtypes: bool, u8/u16/u32/u64, i8/i16/i32/i64, f16/bf16/f32/f64, c64/c128.
  • Default literal typing (when no @dtype):
    • int: smallest signed int that fits
    • float/complex: smallest IEEE type that represents the value exactly (promotes to wider if needed)
  • Promotion inserts explicit Cast nodes in the typed DAG; each promotion emits an info diagnostic (configurable).

Details: docs/type-system.md.

Diagnostics

Diagnostics have stable short codes (e.g. L301) and configurable severities.

  • Spec: docs/diagnostics.md
  • Config file: gonidium.toml (project root)
  • Config doc: docs/config.md

REPL

Start:

cargo run -- repl

REPL v1 accepts three kinds of single-line input:

  • Input declaration: x: f32
  • Assignment: t1 = x + 1
  • Expression: x + 1

Output policy:

  • If it can be interpreted: prints value: dtype
  • Otherwise: prints expr: dtype (optionally simplified)
  • Prints diagnostics produced by this line (promotions, precision loss, etc.)

Architecture Overview

Pipeline (conceptual layers):

source
  -> parse (chumsky)
  -> AST (untyped)
  -> lower (type inference + literal rules + checks)
  -> TypedDag
  -> const_fold (typed)
  -> strip_types (TypedDag -> RecExpr<MathLang> + TypeMap)
  -> egg runner (algebraic rewrites + fusion)
  -> extractor (FusionCost)
  -> restore_types (bottom-up)
  -> TypedDag (optimised)
  -> backend: interpret / codegen
  • TypedDag layer: typing, checking, const-folding, explicit-vs-implicit cast marking.
  • Opt (e-graph) layer: type-erased algebraic optimisation and fusion selection.

Embedding (Kernel Composition)

If you embed Gonidium as a kernel IR in another framework:

  1. parse_dsl to FuncDef
  2. Compose graphs with compose / compose_with_diags
  3. Optionally normalise parameter names with rename_params
  4. lower -> optimize -> your own Backend implementation (or the default PythonBackend) or interp_eval

compose_with_diags emits C201 (compose-symbol-reuse) when a symbol name appears on both sides and is treated as the same variable.

Project Layout

docs/       language + diagnostic + config docs
src/parse/  lexer + chumsky parser
src/ir/     TypedDag + dtypes + lowering
src/opt/    e-graph language + rewrite rules + extraction + roundtrip
src/backend/ interpreter + codegen backends
tests/      parser/lower/opt/backend integration tests

Status / Roadmap

Implemented (v0.0.1):

  • Parser + grammar-driven precedence
  • Lowering: literal typing, promotion + explicit Cast insertion, range/precision checks
  • Optimisation: const-fold + e-graph rewrites + fusion cost extraction
  • Backends: interpreter, Python codegen
  • REPL + configurable diagnostics

Planned directions (non-binding):

  • More builtins and rewrite coverage (focus: DL kernels)
  • Better backend docs and stability guarantees for embedding
  • Improved tooling (format/check, richer trace/visualisation)