webnn-graph 0.3.0

# webnn-graph

`webnn-graph` is a small Rust library and CLI that defines a **WebNN-oriented
graph DSL**, parses it into a minimal AST, and enables multiple downstream uses
such as graph validation, serialization, and WebNN graph construction.

The goal is to keep the language surface **very close to WebNN itself**, while
allowing graphs to be expressed declaratively and reused across tooling.

The project also implements a Netron-like **WebNN graph visualizer** that
allows for interactive exploration of graph structure.

Check it out at [https://blog.ziade.org/webnn-graph](https://blog.ziade.org/webnn-graph)

## Development

### Git Hooks (Recommended)

Enable pre-commit hooks to run clippy locally:

```bash
./scripts/setup-githooks.sh
```

## Conceptual Model

A WebNN graph defined with this project is split across **three distinct files**, each with a single responsibility.

### 1. Graph definition (`.webnn`)

The `.webnn` file describes **only the structure of the graph**:

- Inputs and their types
- Constants and their shapes
- Operator calls and their wiring
- Named outputs

It contains **no actual tensor data**.

This file is intended to be:
- Small
- Human-readable
- Easy to diff and review
- Stable across weight updates

Its EBNF-like grammar:

```
File         ::= Header Block* EOF

Header       ::= "webnn_graph" String "v" Int ("@quantized")? "{"

Block        ::= InputsBlock
               | ConstsBlock
               | NodesBlock
               | OutputsBlock
               | "}"              (* closes the graph *)

InputsBlock  ::= "inputs"  "{" InputDecl*  "}"
ConstsBlock  ::= "consts"  "{" ConstDecl*  "}"
NodesBlock   ::= "nodes"   "{" Stmt*       "}"
OutputsBlock ::= "outputs" "{" OutputItem* "}"

InputDecl    ::= Ident ":" Type ";"
ConstDecl    ::= Ident ":" Type ConstAnnot* ";"

OutputItem   ::= Ident ("," Ident)* ";"?    (* optional semicolon *)

Stmt         ::= (MultiAssign | Assign) ";"
Assign       ::= Ident "=" Expr
MultiAssign  ::= "[" Ident ("," Ident)* "]" "=" Expr

Expr         ::= Call | Ident | Literal
Call         ::= Ident "(" Args? ")"

Args         ::= Arg ("," Arg)*
Arg          ::= Ident "=" Value | Value

Value        ::= Literal | Ident

Literal      ::= Array | String | Number | Boolean | Null
Array        ::= "[" (Value ("," Value)*)? "]"

Boolean      ::= "true" | "false"
Null         ::= "null"

Type         ::= DType Shape
DType        ::= "f32" | "f16" | "i4" | "u4" | "i32" | "u32" | "i64" | "u64" | "i8" | "u8"
Shape        ::= "[" (Int ("," Int)*)? "]"

ConstAnnot   ::= "@weights" "(" String ")"
               | "@scalar"  "(" Number ")"

Ident        ::= (ALPHA | "_") (ALNUM | "_")*
Int          ::= DIGIT+
Number       ::= "-"? DIGIT+ ("." DIGIT+)? (("e"|"E") ("+"|"-")? DIGIT+)? 
String       ::= "\"" ( "\\\"" | "\\\\" | (ANY-but-quote) )* "\""
```

### 2. Weights manifest (`.manifest.json`, optional)

If the graph references external weights using `@weights("key")`, a manifest file can be provided to:

- Describe tensor shapes and data types
- Define offsets and sizes inside a binary weights file
- Validate that referenced weights are well-formed

The manifest is metadata only. It does not contain raw tensor bytes.

### 3. Binary weights file (`.weights`, optional)

The `.weights` file is a simple concatenation of raw tensor data.

It is:
- Compact
- Fast to load
- Independent from graph structure

This separation allows the same graph definition to be reused with different trained weights.


## Core Idea

The library parses the `.webnn` DSL into a **very small, intentionally simple AST**:

- Inputs
- Constants
- Nodes (operator name, inputs, options)
- Outputs

This AST is the **true internal representation** of a graph.

Once parsed, the AST can be:
- Validated
- Serialized
- Transformed
- Used to construct a WebNN graph

## Using the AST

The AST is designed to be easy to consume from other tools. In particular, it can be used to:

- load, save a build an WebNN graph and its weights using **rustnn** or **PyWebNN**
- Generate WebNN JavaScript `MLGraphBuilder` calls
- Perform lightweight graph analysis or transformations

The library does not attempt to deeply re-specify WebNN semantics. Anything not
explicitly checked is passed through and left to the WebNN runtime to validate.

## JSON Serialization (Secondary)

In addition to the text DSL, the AST can be serialized to a **canonical JSON format**.

Important points:

- JSON is **not** the primary authoring format
- It exists as a convenience for programmatic manipulation
- It supports full round-trip conversion back to `.webnn`
- It can store optional metadata such as the graph name

The JSON format is roughly **10x larger** than the `.webnn` DSL and is best suited for tooling, not manual editing.

All CLI commands auto-detect and accept both formats.


## Features

- **Convert ONNX models** to WebNN format with static lowering and dynamic input metadata
- Parse WebNN graph text (`.webnn`) into a simple AST
- Serialize the AST to canonical JSON
- Serialize JSON back to `.webnn` with full round-trip support
- Validate graph structure and optional weights manifest
- Emit WebNN JavaScript builder code (`MLGraphBuilder` calls)
- Pack and unpack binary weight files

This is intended as a **small, hackable reference scaffold**, not a heavy framework.

## Install

### From source (local dev)

```bash
git clone https://github.com/tarekziade/webnn-graph
cd webnn-graph
make build
make run
# Or:
webnn-graph --help
```

### Install the CLI with Cargo

```bash
cargo install webnn-graph
```

## Formats

### Text format: .webnn

The DSL is block-based and declarative:

- inputs {} declares typed inputs
- consts {} declares typed constants
- nodes {} lists operator calls in order
- outputs {} declares named graph outputs


Types use:
```
dtype[dim0, dim1, ...]
```

Supported dtypes: `f32`, `f16`, `i4`, `u4`, `i32`, `u32`, `i64`, `u64`, `i8`, `u8`.

## ONNX to WebNN Conversion

The CLI includes a powerful ONNX-to-WebNN converter that enables you to take existing ONNX
models and convert them to the WebNN format.

### Prerequisites: Static Lowering for Shape Expressions

**Important:** ONNX models may contain symbolic input dimensions. `webnn-graph` can preserve unresolved
symbolic input dimensions in graph metadata (v2), but operations such as `reshape` still require static
shape expressions for WebNN lowering. Dynamic input preservation is experimental and must be enabled with
`--experimental-dynamic-inputs`. Use `--optimize` and `--override-dim` as needed.

#### Why is this necessary?

WebNN's `reshape` operation requires the shape parameter to be a constant, not a dynamically
computed value. Many ONNX models (especially transformers/BERT) use dynamic shape patterns like:

```
Shape → Gather → Concat → Reshape
```

These patterns are typically resolved to static constants during conversion.

#### Built-in Constant Folding

The converter includes a constant folding engine that automatically:
- Evaluates `Shape` operations at conversion time
- Resolves `Gather` and `Concat` operations on constant data
- Eliminates dynamic shape computation patterns
- Reduces model size by 40-50% for transformer models

Simply use the `--optimize` flag to enable constant folding:

```bash
webnn-graph convert-onnx --input model.onnx --optimize \
  --override-dim batch_size=1 \
  --override-dim sequence_length=128
```

**What constant folding does:**
- Identifies nodes with all-constant inputs
- Evaluates them at conversion time
- Replaces them with their computed results
- Removes the evaluated nodes from the graph

**See also:** [Dynamic Dimensions Guide](docs/dynamic-dimensions-guide.md) for help choosing dimension values.

### Converting ONNX Models

Convert ONNX models to WebNN format with built-in constant folding:

```bash
# Basic conversion with optimization (recommended)
webnn-graph convert-onnx --input model.onnx --optimize \
  --override-dim batch_size=1 \
  --override-dim sequence_length=128

# Output: model.webnn + model.weights + model.manifest.json

# Custom output paths
webnn-graph convert-onnx \
  --input model.onnx \
  --optimize \
  --output graph.webnn \
  --weights graph.weights \
  --manifest graph.manifest.json \
  --override-dim batch_size=1 \
  --override-dim sequence_length=128

# Inline weights for small models (not recommended for large models)
webnn-graph convert-onnx --input model.onnx --optimize --inline-weights \
  --override-dim batch_size=1

# Output to JSON format instead of .webnn
webnn-graph convert-onnx --input model.onnx --optimize --output model.json \
  --override-dim batch_size=1 \
  --override-dim sequence_length=128
```

### Example: Converting SmolLM-135M

Download the ONNX model:

```bash
curl -L "https://huggingface.co/HuggingFaceTB/SmolLM-135M/resolve/main/onnx/model.onnx?download=true" \
  -o /tmp/smol_hf.onnx
```

Then convert:

```bash
webnn-graph convert-onnx \
  --input /tmp/smol_hf.onnx \
  --optimize \
  --experimental-dynamic-inputs \
  --output /tmp/smol_hf.webnn \
  --weights /tmp/smol_hf.weights \
  --manifest /tmp/smol_hf.manifest.json
```

Observed output from that run:
- Graph header: `webnn_graph "main_graph" v2`
- `/tmp/smol_hf.webnn`: ~694 KB
- `/tmp/smol_hf.weights`: ~513 MB
- `/tmp/smol_hf.manifest.json`: ~423 KB

### Example: Converting all-MiniLM-L6-v2

Download the ONNX model:

```bash
curl -L "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx?download=true" \
  -o /tmp/minilm.onnx
```

Convert with common sentence-embedding overrides:

```bash
webnn-graph convert-onnx \
  --input /tmp/minilm.onnx \
  --optimize \
  --override-dim batch_size=1 \
  --override-dim sequence_length=128 \
  --output /tmp/minilm.webnn \
  --weights /tmp/minilm.weights \
  --manifest /tmp/minilm.manifest.json
```

### Supported ONNX Operations

The converter focuses on NLP/Transformer operations:

- **Matrix operations**: MatMul, Gemm
- **Element-wise**: Add, Sub, Mul, Div, Pow
- **Normalization**: LayerNormalization, Softmax
- **Tensor manipulation**: Reshape, Transpose, Concat, Split, Squeeze, Unsqueeze
- **Activation**: Relu, Sigmoid, Tanh, Gelu, etc.
- **Reduction**: ReduceMean, ReduceSum, ReduceMax, ReduceMin
- **Utility**: Gather, Slice

### Complete ONNX Workflow Example

```bash
# Step 1: Convert ONNX → WebNN with constant folding
webnn-graph convert-onnx --input bert-base.onnx --optimize \
  --override-dim batch_size=1 \
  --override-dim sequence_length=128 \
  --override-dim token_type_ids=128

# Output: bert-base.webnn + bert-base.weights + bert-base.manifest.json

# Step 2: Generate JavaScript for browser/runtime
webnn-graph emit-js bert-base.webnn > buildGraph.js

# Step 3: (Optional) Create HTML visualizer
webnn-graph emit-html bert-base.webnn > visualizer.html
open visualizer.html

# The --optimize flag performs constant folding automatically:
# - Eliminates Shape/Gather/Concat patterns
# - Reduces model size by 40-50%
# - No external preprocessing needed!
```

**Example results for BERT models with `--optimize`:**
- **Original ONNX**: 637 nodes with Shape operations
- **After constant folding**: 317 nodes (50% reduction), no Shape operations
- All reshape shape parameters become static constants
- **WebNN output**: All reshape operations use static constants, fully compatible

## Examples

Below is the same graph expressed in webnn and JSON.

### Text 

```webnn
webnn_graph "resnet_head" v1 {
  inputs {
    x: f32[1, 2048];
  }

  consts {
    W: f32[2048, 1000] @weights("W");
    b: f32[1000]       @weights("b");
  }

  nodes {
    logits0 = matmul(x, W);
    logits  = add(logits0, b);
    probs   = softmax(logits, axis=1);
  }

  outputs { probs; }
}
```

### JSON

```json
{
  "format": "webnn-graph-json",
  "version": 1,
  "inputs": {
    "x": { "dataType": "float32", "shape": [1, 2048] }
  },
  "consts": {
    "W": {
      "dataType": "float32",
      "shape": [2048, 1000],
      "init": { "kind": "weights", "ref": "W" }
    },
    "b": {
      "dataType": "float32",
      "shape": [1000],
      "init": { "kind": "weights", "ref": "b" }
    }
  },
  "nodes": [
    { "id": "logits0", "op": "matmul", "inputs": ["x", "W"], "options": {} },
    { "id": "logits",  "op": "add",    "inputs": ["logits0", "b"], "options": {} },
    { "id": "probs",   "op": "softmax", "inputs": ["logits"], "options": { "axis": 1 } }
  ],
  "outputs": { "probs": "probs" }
}
```

## Notes

- Validation is intentionally lightweight and structural.
- Operator semantics are mostly pass-through.
- The design favors simplicity and reuse over completeness.
- The AST is stable and meant to be consumed by other WebNN tooling.