rustnn 0.3.0 - Docs.rs

<div align="center">
  <img src="logo/rustnn.png" alt="rustnn logo" width="200"/>

  # rustnn / PyWebNN

  A Rust implementation of WebNN graph handling with Python bindings that implement the W3C WebNN API specification.
</div>

---

## [WARNING] **EXPERIMENTAL - DO NOT USE IN PRODUCTION**

**This project is a proof-of-concept and experimental implementation. It is NOT ready for production use.**

This is an early-stage experiment to explore WebNN graph handling and format conversion. Many features are incomplete, untested, or may change significantly. Use at your own risk for research and experimentation only.

---

**Features:**
- Rust **Rust Library**: Validates WebNN graphs and converts to ONNX/CoreML formats
- Python **Python API**: Complete W3C WebNN API implementation via PyO3 bindings
- [TARGET] **Runtime Backend Selection**: Choose CPU, GPU, or NPU execution at context creation
- [STATS] **Format Conversion**: Export graphs to ONNX (cross-platform) and CoreML (macOS)
- [DEPLOY] **Model Execution**: Run converted models on CPU, GPU, and Neural Engine (macOS)
- [FAST] **Async Support**: Non-blocking execution with Python asyncio integration
- [SEARCH] **Graph Visualization**: Generate Graphviz diagrams of your neural networks
- [OK] **Validation**: Comprehensive graph validation matching Chromium's WebNN implementation
- [MATH] **Shape Inference**: Automatic shape computation with NumPy-style broadcasting
- [STYLE] **Real Examples**: Complete 106-layer MobileNetV2 achieving 99.60% accuracy + Transformer text generation with attention

---

## [PACKAGE] Installation

### Python Package (PyWebNN)

**Quick Start (Validation & Conversion Only):**

```bash
pip install pywebnn
```

This installs the base package for graph validation and format conversion (no execution).

**For Full Execution Support:**

To execute neural networks, you need ONNX Runtime:

```bash
# Install PyWebNN + ONNX Runtime for CPU execution
pip install pywebnn onnxruntime

# Or for GPU execution (requires CUDA)
pip install pywebnn onnxruntime-gpu
```

**Note:** The PyPI package currently includes validation and conversion features. ONNX Runtime execution requires the `onnxruntime` package to be installed separately. We're working on better integration in future releases.

**Build from Source (with Execution Built-in):**

For a fully integrated package with execution support:

```bash
# Clone the repository
git clone https://github.com/tarekziade/rustnn.git
cd rustnn

# Install with ONNX Runtime support (recommended)
make python-dev  # Sets up venv and builds with ONNX Runtime

# Or manually with maturin
pip install maturin
maturin develop --features python,onnx-runtime

# macOS only: Add CoreML support
maturin develop --features python,onnx-runtime,coreml-runtime
```

**Requirements:**
- Python 3.11+
- NumPy 1.20+
- ONNX Runtime 1.23+ (for execution)

### Rust Library

Add to your `Cargo.toml`:

```toml
[dependencies]
rustnn = "0.1"
```

Or use directly from this repository.

---

## [DEPLOY] Quick Start

### Python API

```python
import webnn
import numpy as np

# Create ML context - use hints for device selection
ml = webnn.ML()
context = ml.create_context(accelerated=False)  # CPU-only execution
# Or: context = ml.create_context(accelerated=True)  # Request GPU/NPU if available

# Create graph builder
builder = context.create_graph_builder()

# Define a simple graph: z = relu(x + y)
x = builder.input("x", [2, 3], "float32")
y = builder.input("y", [2, 3], "float32")
z = builder.add(x, y)
output = builder.relu(z)

# Compile the graph (creates backend-agnostic representation)
graph = builder.build({"output": output})

# Prepare input data
x_data = np.array([[1, -2, 3], [4, -5, 6]], dtype=np.float32)
y_data = np.array([[-1, 2, -3], [-4, 5, -6]], dtype=np.float32)

# Execute: converts to backend-specific format and runs
results = context.compute(graph, {"x": x_data, "y": y_data})
print(results["output"])  # Actual computed values from ONNX Runtime

# Optional: Export the ONNX model to file (for deployment, inspection, etc.)
context.convert_to_onnx(graph, "model.onnx")
```

### Backend Selection

Following the [W3C WebNN Device Selection spec](https://github.com/webmachinelearning/webnn/blob/main/device-selection-explainer.md), device selection uses **hints** rather than explicit device types:

```python
# Request GPU/NPU acceleration (default)
context = ml.create_context(accelerated=True, power_preference="default")
print(f"Accelerated: {context.accelerated}")  # Check if acceleration is available

# Request low-power execution (prefers NPU over GPU)
context = ml.create_context(accelerated=True, power_preference="low-power")

# Request high-performance execution (prefers GPU)
context = ml.create_context(accelerated=True, power_preference="high-performance")

# CPU-only execution (no acceleration)
context = ml.create_context(accelerated=False)
```

**Device Selection Logic:**
- `accelerated=True` + `power_preference="low-power"` → **NPU** > GPU > CPU
- `accelerated=True` + `power_preference="high-performance"` → **GPU** > NPU > CPU
- `accelerated=True` + `power_preference="default"` → **GPU** > NPU > CPU
- `accelerated=False` → **CPU only**

**Platform-Specific Backends:**
- **NPU**: CoreML Neural Engine (Apple Silicon macOS only)
- **GPU**: ONNX Runtime GPU (cross-platform) or CoreML GPU (macOS)
- **CPU**: ONNX Runtime CPU (cross-platform)

**Important:** The `accelerated` property indicates **platform capability**, not a guarantee. Query `context.accelerated` after creation to check if GPU/NPU resources are available. The platform controls actual device allocation based on runtime conditions.

The graph compilation (`builder.build()`) creates a **backend-agnostic representation**. Backend-specific conversion happens automatically during `compute()` based on the context's selected backend.

### Async Execution

WebNN supports asynchronous execution following the W3C specification. Use `AsyncMLContext` for non-blocking operations:

```python
import asyncio
import numpy as np
import webnn

async def main():
    # Create context
    ml = webnn.ML()
    context = ml.create_context(accelerated=False)
    async_context = webnn.AsyncMLContext(context)

    # Build graph
    builder = async_context.create_graph_builder()
    x = builder.input("x", [2, 3], "float32")
    y = builder.input("y", [2, 3], "float32")
    z = builder.add(x, y)
    output = builder.relu(z)
    graph = builder.build({"output": output})

    # Async dispatch (non-blocking execution)
    x_data = np.array([[1, -2, 3], [4, -5, 6]], dtype=np.float32)
    y_data = np.array([[-1, 2, -3], [-4, 5, -6]], dtype=np.float32)
    await async_context.dispatch(graph, {"x": x_data, "y": y_data})

    print("Graph executed asynchronously!")

asyncio.run(main())
```

### Rust Library

```rust
use rustnn::{GraphInfo, GraphValidator, ContextProperties};
use rustnn::converters::{ConverterRegistry, OnnxConverter};

// Load graph from JSON
let graph_info: GraphInfo = serde_json::from_str(&json_data)?;

// Validate the graph
let validator = GraphValidator::new(&graph_info, ContextProperties::default());
let artifacts = validator.validate()?;

// Convert to ONNX
let mut registry = ConverterRegistry::new();
registry.register(Box::new(OnnxConverter));
let converted = registry.convert("onnx", &graph_info)?;

// Save to file
std::fs::write("model.onnx", &converted.data)?;

// Execute with ONNX Runtime (requires "onnx-runtime" feature)
#[cfg(feature = "onnx-runtime")]
{
    use rustnn::executors::onnx::run_onnx_zeroed;

    // Execute model with zeroed inputs
    run_onnx_zeroed(&converted.data)?;
    println!("Model executed successfully with ONNX Runtime");
}

// Execute with CoreML (requires "coreml-runtime" feature, macOS only)
#[cfg(all(target_os = "macos", feature = "coreml-runtime"))]
{
    use rustnn::executors::coreml::run_coreml_zeroed_cached;
    use rustnn::converters::CoremlMlProgramConverter;

    // Convert to CoreML MLProgram
    registry.register(Box::new(CoremlMlProgramConverter::default()));
    let coreml = registry.convert("coreml", &graph_info)?;

    // Execute on GPU (0=CPU, 1=GPU, 2=Neural Engine)
    run_coreml_zeroed_cached(&coreml.data, 1)?;
    println!("Model executed successfully with CoreML");
}
```

---

## [STYLE] Examples

### Real Image Classification with Complete Pretrained MobileNetV2

The `examples/mobilenetv2_complete.py` demonstrates real image classification using the **complete 106-layer pretrained MobileNetV2** from the [WebNN test-data repository](https://github.com/webmachinelearning/test-data):

```bash
# Download all 106 pretrained weight files (first time only)
bash scripts/download_mobilenet_weights.sh

# Run with CPU backend
python examples/mobilenetv2_complete.py examples/images/test.jpg --backend cpu

# Run with GPU backend
python examples/mobilenetv2_complete.py examples/images/test.jpg --backend gpu

# Run with CoreML backend (macOS only - fastest!)
python examples/mobilenetv2_complete.py examples/images/test.jpg --backend coreml
```

**Sample Output** (classifying a red panda):

```
======================================================================
Complete MobileNetV2 Image Classification with WebNN
======================================================================
Image: examples/images/test.jpg
Backend: ONNX CPU

Loading all pretrained MobileNetV2 weights...
   [OK] Loaded 106 weight tensors
   Weight load time: 22.79ms

Building complete MobileNetV2 graph...
   Layer 0: Initial conv 3->32
   Block 0: 32->16 (stride=1, expansion=1)
   Block 1: 16->24 (stride=2, expansion=6)
   ...
   Block 16: 160->320 (stride=1, expansion=6)
   Layer final: Conv 320->1280
   [OK] Complete MobileNetV2 graph built!
   Graph build time: 913.78ms

Top 5 Predictions (Real ImageNet Labels):
----------------------------------------------------------------------
   1. lesser panda                                        99.60%
   2. polecat                                              0.20%
   3. weasel                                               0.09%
   4. black-footed ferret                                  0.02%
   5. kit fox                                              0.01%

Performance Summary:
  - Weight Load:   22.79ms
  - Preprocessing: 15.52ms
  - Graph Build:   913.78ms
  - Inference:     74.41ms (CPU) / 77.14ms (GPU) / 51.93ms (CoreML)
======================================================================
```

**How It Works:**
- **Complete 106-layer architecture** - All pretrained weights from WebNN test-data
- **17 inverted residual blocks** - Full MobileNetV2 architecture
- **Built with WebNN operations** - Uses conv2d, add, clamp, global_average_pool, gemm, softmax
- **Real ImageNet-1000 labels** - Accurate real-world predictions
- **Three backend support** - ONNX CPU, ONNX GPU, CoreML (Neural Engine on Apple Silicon)
- **Production-quality accuracy** - 99.60% confidence on correct class

**Architecture Details:**
- Initial conv: 3→32 channels (stride 2)
- 17 inverted residual blocks with varying expansions (1x or 6x)
- Depthwise separable convolutions using groups parameter
- Residual connections for stride=1 blocks
- ReLU6 activations (clamp 0-6)
- Final conv: 320→1280 channels
- Global average pooling + classifier (1280→1000)

This implementation **exactly matches the JavaScript WebNN demos**, building the complete graph layer-by-layer using WebNN API operations.

### Text Generation with Transformer Attention

The `examples/text_generation_gpt.py` demonstrates next-token generation using a simplified transformer with attention, similar to the [JavaScript WebNN text generation demo](https://github.com/microsoft/webnn-developer-preview/tree/main/demos/text-generation):

```bash
# Run basic generation on all 3 backends
make text-gen-demo

# Or run on a specific backend
python examples/text_generation_gpt.py --prompt "Hello world" --tokens 30 --backend cpu
python examples/text_generation_gpt.py --prompt "Hello world" --tokens 30 --backend gpu
python examples/text_generation_gpt.py --prompt "Hello world" --tokens 30 --backend coreml

# Train the model on sample data
make text-gen-train

# Generate with trained weights
make text-gen-trained

# Run enhanced version with KV cache
make text-gen-enhanced
```

**Sample Output:**

```
======================================================================
Next-Token Generation with Attention (WebNN)
======================================================================
Backend: ONNX CPU
Model: vocab=256 (byte-level), d_model=64, max_seq=32

[OK] Context created (accelerated=False)
[OK] Model initialized

Prompt: 'Hello world'
Prompt tokens (11): [72, 101, 108, 108, 111, 32, 119, 111, 114, 108]...

Generating 30 tokens autoregressively...
======================================================================
  Token 1/30: 87 (prob: 0.0042)
  Token 10/30: 123 (prob: 0.0043)
  Token 20/30: 136 (prob: 0.0037)
  Token 30/30: 99 (prob: 0.0040)
======================================================================

WebNN Operations Demonstrated:
  [OK] matmul - Matrix multiplication for projections
  [OK] layer_normalization - Normalizing activations
  [OK] relu - Activation function
  [OK] softmax - Output probability distribution
  [OK] reduce_mean - Simplified attention pooling
  [OK] gemm - General matrix multiply with transpose
======================================================================
```

**How It Works:**
- **Transformer architecture** - Single-head attention, layer normalization, feed-forward networks
- **Autoregressive generation** - Generates one token at a time based on context
- **Positional embeddings** - Sinusoidal position encodings
- **Temperature sampling** - Configurable randomness in token selection
- **Training support** - Train on custom text with `train_text_model.py`
- **KV caching** - Enhanced version with efficient key-value caching
- **Three backend support** - ONNX CPU, ONNX GPU, CoreML (Neural Engine on Apple Silicon)

**Complete Workflow:**
```bash
# 1. Train on sample data (10 epochs, ~1-2 minutes)
make text-gen-train

# 2. Generate with trained weights (better quality)
make text-gen-trained

# 3. Or use enhanced version with KV cache
make text-gen-enhanced
```

The training script (`examples/train_text_model.py`) uses simple gradient descent to train on text data, and the enhanced version (`examples/text_generation_enhanced.py`) includes KV caching for efficient generation and HuggingFace tokenizer support.

### Additional Examples

- **`examples/python_simple.py`** - Basic graph building and execution
- **`examples/python_matmul.py`** - Matrix multiplication operations
- **`examples/image_classification.py`** - Full classification pipeline (random weights)

See the [examples/](examples/) directory for more code samples.

---

##  Documentation

The Python API implements the [W3C WebNN specification](https://www.w3.org/TR/webnn/).

**Quick Links:**
- **[API Reference](docs/api-reference.md)** - Complete Python API documentation
- **[Getting Started](docs/getting-started.md)** - Installation and first steps
- **[Architecture](docs/architecture.md)** - Design principles and structure
- **[Examples](examples/)** - Working code samples

---

## Rust Rust CLI Usage

The Rust library includes a powerful CLI tool for working with WebNN graphs.

### Validate a Graph

```bash
cargo run -- examples/sample_graph.json
```

### Visualize a Graph

```bash
# Generate DOT file
cargo run -- examples/sample_graph.json --export-dot graph.dot

# Convert to PNG (requires graphviz)
dot -Tpng graph.dot -o graph.png

# Or use the Makefile shortcut (macOS)
make viz
```

### Convert to ONNX

```bash
cargo run -- examples/sample_graph.json \
    --convert onnx \
    --convert-output model.onnx
```

### Convert to CoreML

```bash
cargo run -- examples/sample_graph.json \
    --convert coreml \
    --convert-output model.mlmodel
```

### Execute Models

**ONNX Runtime** (cross-platform):

```bash
cargo run --features onnx-runtime -- \
    examples/sample_graph.json \
    --convert onnx \
    --run-onnx
```

**CoreML Runtime** (macOS only):

```bash
cargo run --features coreml-runtime -- \
    examples/sample_graph.json \
    --convert coreml \
    --run-coreml \
    --device gpu  # or 'cpu', 'ane' for Neural Engine
```

### Makefile Targets

```bash
make help              # Show all available targets
make build             # Build Rust project
make test              # Run Rust tests
make python-dev        # Install Python package in dev mode
make python-test       # Run Python tests
make docs-serve        # Serve documentation locally
make validate-all-env  # Run full test pipeline
```

---

##  Architecture

**Design Principles:**
- **Backend-Agnostic Graphs** - Platform-independent representation, runtime backend selection
- **WebNN Spec Compliance** - Implements W3C Device Selection and MLTensor specs
- **Rust-First** - Pure Rust core with thin Python bindings
- **Lazy Conversion** - Backend conversion happens during execution, not compilation

See **[Architecture Guide](docs/architecture.md)** for details.

---

##  Development

```bash
# Clone and build
git clone https://github.com/tarekziade/rustnn.git
cd rustnn
cargo build --release
maturin develop --features python

# Run tests
cargo test && python -m pytest tests/
```

See **[Development Guide](docs/development.md)** for detailed instructions.

---


## 🧪 Testing

### Python Tests

```bash
# Install test dependencies
pip install -e ".[dev]"

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_python_api.py -v

# Run integration tests with cleanup
python tests/test_integration.py --cleanup
```

### Rust Tests

```bash
# All tests
cargo test

# Specific module
cargo test converters

# With features
cargo test --features onnx-runtime,coreml-runtime
```

---

##  Project Status

**[SUCCESS] 85 WebNN operations fully implemented across all backends!**

- [OK] W3C WebNN API implementation in Python
- [OK] Runtime backend selection (CPU, GPU, Neural Engine)
- [OK] 85/95 WebNN operations (89% spec coverage)
- [OK] ONNX Runtime execution (cross-platform)
- [OK] CoreML execution (macOS GPU/Neural Engine)
- [OK] Async execution with MLTensor management
- [OK] Shape inference with NumPy-style broadcasting
- [OK] Complete MobileNetV2 + Transformer examples

See [docs/operator-status.md](docs/operator-status.md) for complete implementation details.

---

## 🤝 Contributing

Contributions are welcome! Please see:

- [AGENTS.md](AGENTS.md) - Project architecture and conventions for AI agents
- [TODO.txt](TODO.txt) - Feature requests and known limitations

### Quick Contribution Guide

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
3. **Install git hooks** (optional but recommended):
   ```bash
   ./scripts/install-git-hooks.sh
   ```
   This installs a pre-commit hook that automatically checks code formatting before each commit.
4. Make your changes
5. Run tests: `cargo test && pytest tests/`
6. Format code: `cargo fmt` (or let the pre-commit hook handle it)
7. Commit: `git commit -m "Add my feature"`
8. Push and create a pull request

**Note:** The pre-commit hook will prevent commits with unformatted code. If needed, you can bypass it with `git commit --no-verify`, but this is not recommended.

---

##  License

Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.

---

##  Links

- **GitHub**: [https://github.com/tarekziade/rustnn](https://github.com/tarekziade/rustnn)
- **PyPI**: [https://pypi.org/project/pywebnn/](https://pypi.org/project/pywebnn/)
- **Documentation**: [https://tarekziade.github.io/rustnn/](https://tarekziade.github.io/rustnn/)
- **W3C WebNN Spec**: [https://www.w3.org/TR/webnn/](https://www.w3.org/TR/webnn/)
- **Issues**: [https://github.com/tarekziade/rustnn/issues](https://github.com/tarekziade/rustnn/issues)

---

##  Acknowledgments

- W3C WebNN Community Group for the specification
- Chromium WebNN implementation for reference
- PyO3 project for excellent Python-Rust bindings
- Maturin for seamless Python package building

---

**Made with  by [Tarek Ziade](https://github.com/tarekziade)**