<div align="center">
<img src="logo/rustnn.png" alt="rustnn logo" width="200"/>
# rustnn / PyWebNN
A Rust implementation of WebNN graph handling with Python bindings that implement the W3C WebNN API specification.
</div>
---
## [WARNING] **EXPERIMENTAL - DO NOT USE IN PRODUCTION**
**This project is a proof-of-concept and experimental implementation. It is NOT ready for production use.**
This is an early-stage experiment to explore WebNN graph handling and format conversion. Many features are incomplete, untested, or may change significantly. Use at your own risk for research and experimentation only.
---
**Features:**
- Rust **Rust Library**: Validates WebNN graphs and converts to ONNX/CoreML formats
- Python **Python API**: Complete W3C WebNN API implementation via PyO3 bindings
- [TARGET] **Runtime Backend Selection**: Choose CPU, GPU, or NPU execution at context creation
- [STATS] **Format Conversion**: Export graphs to ONNX (cross-platform) and CoreML (macOS)
- [DEPLOY] **Model Execution**: Run converted models on CPU, GPU, and Neural Engine (macOS)
- [FAST] **Async Support**: Non-blocking execution with Python asyncio integration
- [SEARCH] **Graph Visualization**: Generate Graphviz diagrams of your neural networks
- [OK] **Validation**: Comprehensive graph validation matching Chromium's WebNN implementation
- [MATH] **Shape Inference**: Automatic shape computation with NumPy-style broadcasting
- [STYLE] **Real Examples**: Complete 106-layer MobileNetV2 achieving 99.60% accuracy + Transformer text generation with attention
---
## [PACKAGE] Installation
### Python Package (PyWebNN)
**Quick Start (Validation & Conversion Only):**
```bash
pip install pywebnn
```
This installs the base package for graph validation and format conversion (no execution).
**For Full Execution Support:**
To execute neural networks, you need ONNX Runtime:
```bash
# Install PyWebNN + ONNX Runtime for CPU execution
pip install pywebnn onnxruntime
# Or for GPU execution (requires CUDA)
pip install pywebnn onnxruntime-gpu
```
**Note:** The PyPI package currently includes validation and conversion features. ONNX Runtime execution requires the `onnxruntime` package to be installed separately. We're working on better integration in future releases.
**Build from Source (with Execution Built-in):**
For a fully integrated package with execution support:
```bash
# Clone the repository
git clone https://github.com/tarekziade/rustnn.git
cd rustnn
# Install with ONNX Runtime support (recommended)
make python-dev # Sets up venv and builds with ONNX Runtime
# Or manually with maturin
pip install maturin
maturin develop --features python,onnx-runtime
# macOS only: Add CoreML support
maturin develop --features python,onnx-runtime,coreml-runtime
```
**Requirements:**
- Python 3.11+
- NumPy 1.20+
- ONNX Runtime 1.23+ (for execution)
### Rust Library
Add to your `Cargo.toml`:
```toml
[dependencies]
rustnn = "0.1"
```
Or use directly from this repository.
---
## [DEPLOY] Quick Start
### Python API
```python
import webnn
import numpy as np
# Create ML context - use hints for device selection
ml = webnn.ML()
context = ml.create_context(accelerated=False) # CPU-only execution
# Or: context = ml.create_context(accelerated=True) # Request GPU/NPU if available
# Create graph builder
builder = context.create_graph_builder()
# Define a simple graph: z = relu(x + y)
x = builder.input("x", [2, 3], "float32")
y = builder.input("y", [2, 3], "float32")
z = builder.add(x, y)
output = builder.relu(z)
# Compile the graph (creates backend-agnostic representation)
graph = builder.build({"output": output})
# Prepare input data
x_data = np.array([[1, -2, 3], [4, -5, 6]], dtype=np.float32)
y_data = np.array([[-1, 2, -3], [-4, 5, -6]], dtype=np.float32)
# Execute: converts to backend-specific format and runs
results = context.compute(graph, {"x": x_data, "y": y_data})
print(results["output"]) # Actual computed values from ONNX Runtime
# Optional: Export the ONNX model to file (for deployment, inspection, etc.)
context.convert_to_onnx(graph, "model.onnx")
```
### Backend Selection
Following the [W3C WebNN Device Selection spec](https://github.com/webmachinelearning/webnn/blob/main/device-selection-explainer.md), device selection uses **hints** rather than explicit device types:
```python
# Request GPU/NPU acceleration (default)
context = ml.create_context(accelerated=True, power_preference="default")
print(f"Accelerated: {context.accelerated}") # Check if acceleration is available
# Request low-power execution (prefers NPU over GPU)
context = ml.create_context(accelerated=True, power_preference="low-power")
# Request high-performance execution (prefers GPU)
context = ml.create_context(accelerated=True, power_preference="high-performance")
# CPU-only execution (no acceleration)
context = ml.create_context(accelerated=False)
```
**Device Selection Logic:**
- `accelerated=True` + `power_preference="low-power"` → **NPU** > GPU > CPU
- `accelerated=True` + `power_preference="high-performance"` → **GPU** > NPU > CPU
- `accelerated=True` + `power_preference="default"` → **GPU** > NPU > CPU
- `accelerated=False` → **CPU only**
**Platform-Specific Backends:**
- **NPU**: CoreML Neural Engine (Apple Silicon macOS only)
- **GPU**: ONNX Runtime GPU (cross-platform) or CoreML GPU (macOS)
- **CPU**: ONNX Runtime CPU (cross-platform)
**Important:** The `accelerated` property indicates **platform capability**, not a guarantee. Query `context.accelerated` after creation to check if GPU/NPU resources are available. The platform controls actual device allocation based on runtime conditions.
The graph compilation (`builder.build()`) creates a **backend-agnostic representation**. Backend-specific conversion happens automatically during `compute()` based on the context's selected backend.
### Async Execution
WebNN supports asynchronous execution following the W3C specification. Use `AsyncMLContext` for non-blocking operations:
```python
import asyncio
import numpy as np
import webnn
async def main():
# Create context
ml = webnn.ML()
context = ml.create_context(accelerated=False)
async_context = webnn.AsyncMLContext(context)
# Build graph
builder = async_context.create_graph_builder()
x = builder.input("x", [2, 3], "float32")
y = builder.input("y", [2, 3], "float32")
z = builder.add(x, y)
output = builder.relu(z)
graph = builder.build({"output": output})
# Async dispatch (non-blocking execution)
x_data = np.array([[1, -2, 3], [4, -5, 6]], dtype=np.float32)
y_data = np.array([[-1, 2, -3], [-4, 5, -6]], dtype=np.float32)
await async_context.dispatch(graph, {"x": x_data, "y": y_data})
print("Graph executed asynchronously!")
asyncio.run(main())
```
### Rust Library
```rust
use rustnn::{GraphInfo, GraphValidator, ContextProperties};
use rustnn::converters::{ConverterRegistry, OnnxConverter};
// Load graph from JSON
let graph_info: GraphInfo = serde_json::from_str(&json_data)?;
// Validate the graph
let validator = GraphValidator::new(&graph_info, ContextProperties::default());
let artifacts = validator.validate()?;
// Convert to ONNX
let mut registry = ConverterRegistry::new();
registry.register(Box::new(OnnxConverter));
let converted = registry.convert("onnx", &graph_info)?;
// Save to file
std::fs::write("model.onnx", &converted.data)?;
// Execute with ONNX Runtime (requires "onnx-runtime" feature)
#[cfg(feature = "onnx-runtime")]
{
use rustnn::executors::onnx::run_onnx_zeroed;
// Execute model with zeroed inputs
run_onnx_zeroed(&converted.data)?;
println!("Model executed successfully with ONNX Runtime");
}
// Execute with CoreML (requires "coreml-runtime" feature, macOS only)
#[cfg(all(target_os = "macos", feature = "coreml-runtime"))]
{
use rustnn::executors::coreml::run_coreml_zeroed_cached;
use rustnn::converters::CoremlMlProgramConverter;
// Convert to CoreML MLProgram
registry.register(Box::new(CoremlMlProgramConverter::default()));
let coreml = registry.convert("coreml", &graph_info)?;
// Execute on GPU (0=CPU, 1=GPU, 2=Neural Engine)
run_coreml_zeroed_cached(&coreml.data, 1)?;
println!("Model executed successfully with CoreML");
}
```
---
## [STYLE] Examples
### Real Image Classification with Complete Pretrained MobileNetV2
The `examples/mobilenetv2_complete.py` demonstrates real image classification using the **complete 106-layer pretrained MobileNetV2** from the [WebNN test-data repository](https://github.com/webmachinelearning/test-data):
```bash
# Download all 106 pretrained weight files (first time only)
bash scripts/download_mobilenet_weights.sh
# Run with CPU backend
python examples/mobilenetv2_complete.py examples/images/test.jpg --backend cpu
# Run with GPU backend
python examples/mobilenetv2_complete.py examples/images/test.jpg --backend gpu
# Run with CoreML backend (macOS only - fastest!)
python examples/mobilenetv2_complete.py examples/images/test.jpg --backend coreml
```
**Sample Output** (classifying a red panda):
```
======================================================================
Complete MobileNetV2 Image Classification with WebNN
======================================================================
Image: examples/images/test.jpg
Backend: ONNX CPU
Loading all pretrained MobileNetV2 weights...
[OK] Loaded 106 weight tensors
Weight load time: 22.79ms
Building complete MobileNetV2 graph...
Layer 0: Initial conv 3->32
Block 0: 32->16 (stride=1, expansion=1)
Block 1: 16->24 (stride=2, expansion=6)
...
Block 16: 160->320 (stride=1, expansion=6)
Layer final: Conv 320->1280
[OK] Complete MobileNetV2 graph built!
Graph build time: 913.78ms
Top 5 Predictions (Real ImageNet Labels):
----------------------------------------------------------------------
1. lesser panda 99.60%
2. polecat 0.20%
3. weasel 0.09%
4. black-footed ferret 0.02%
5. kit fox 0.01%
Performance Summary:
- Weight Load: 22.79ms
- Preprocessing: 15.52ms
- Graph Build: 913.78ms
- Inference: 74.41ms (CPU) / 77.14ms (GPU) / 51.93ms (CoreML)
======================================================================
```
**How It Works:**
- **Complete 106-layer architecture** - All pretrained weights from WebNN test-data
- **17 inverted residual blocks** - Full MobileNetV2 architecture
- **Built with WebNN operations** - Uses conv2d, add, clamp, global_average_pool, gemm, softmax
- **Real ImageNet-1000 labels** - Accurate real-world predictions
- **Three backend support** - ONNX CPU, ONNX GPU, CoreML (Neural Engine on Apple Silicon)
- **Production-quality accuracy** - 99.60% confidence on correct class
**Architecture Details:**
- Initial conv: 3→32 channels (stride 2)
- 17 inverted residual blocks with varying expansions (1x or 6x)
- Depthwise separable convolutions using groups parameter
- Residual connections for stride=1 blocks
- ReLU6 activations (clamp 0-6)
- Final conv: 320→1280 channels
- Global average pooling + classifier (1280→1000)
This implementation **exactly matches the JavaScript WebNN demos**, building the complete graph layer-by-layer using WebNN API operations.
### Text Generation with Transformer Attention
The `examples/text_generation_gpt.py` demonstrates next-token generation using a simplified transformer with attention, similar to the [JavaScript WebNN text generation demo](https://github.com/microsoft/webnn-developer-preview/tree/main/demos/text-generation):
```bash
# Run basic generation on all 3 backends
make text-gen-demo
# Or run on a specific backend
python examples/text_generation_gpt.py --prompt "Hello world" --tokens 30 --backend cpu
python examples/text_generation_gpt.py --prompt "Hello world" --tokens 30 --backend gpu
python examples/text_generation_gpt.py --prompt "Hello world" --tokens 30 --backend coreml
# Train the model on sample data
make text-gen-train
# Generate with trained weights
make text-gen-trained
# Run enhanced version with KV cache
make text-gen-enhanced
```
**Sample Output:**
```
======================================================================
Next-Token Generation with Attention (WebNN)
======================================================================
Backend: ONNX CPU
Model: vocab=256 (byte-level), d_model=64, max_seq=32
[OK] Context created (accelerated=False)
[OK] Model initialized
Prompt: 'Hello world'
Prompt tokens (11): [72, 101, 108, 108, 111, 32, 119, 111, 114, 108]...
Generating 30 tokens autoregressively...
======================================================================
Token 1/30: 87 (prob: 0.0042)
Token 10/30: 123 (prob: 0.0043)
Token 20/30: 136 (prob: 0.0037)
Token 30/30: 99 (prob: 0.0040)
======================================================================
WebNN Operations Demonstrated:
[OK] matmul - Matrix multiplication for projections
[OK] layer_normalization - Normalizing activations
[OK] relu - Activation function
[OK] softmax - Output probability distribution
[OK] reduce_mean - Simplified attention pooling
[OK] gemm - General matrix multiply with transpose
======================================================================
```
**How It Works:**
- **Transformer architecture** - Single-head attention, layer normalization, feed-forward networks
- **Autoregressive generation** - Generates one token at a time based on context
- **Positional embeddings** - Sinusoidal position encodings
- **Temperature sampling** - Configurable randomness in token selection
- **Training support** - Train on custom text with `train_text_model.py`
- **KV caching** - Enhanced version with efficient key-value caching
- **Three backend support** - ONNX CPU, ONNX GPU, CoreML (Neural Engine on Apple Silicon)
**Complete Workflow:**
```bash
# 1. Train on sample data (10 epochs, ~1-2 minutes)
make text-gen-train
# 2. Generate with trained weights (better quality)
make text-gen-trained
# 3. Or use enhanced version with KV cache
make text-gen-enhanced
```
The training script (`examples/train_text_model.py`) uses simple gradient descent to train on text data, and the enhanced version (`examples/text_generation_enhanced.py`) includes KV caching for efficient generation and HuggingFace tokenizer support.
### Additional Examples
- **`examples/python_simple.py`** - Basic graph building and execution
- **`examples/python_matmul.py`** - Matrix multiplication operations
- **`examples/image_classification.py`** - Full classification pipeline (random weights)
See the [examples/](examples/) directory for more code samples.
---
## Documentation
The Python API implements the [W3C WebNN specification](https://www.w3.org/TR/webnn/).
**Quick Links:**
- **[API Reference](docs/api-reference.md)** - Complete Python API documentation
- **[Getting Started](docs/getting-started.md)** - Installation and first steps
- **[Architecture](docs/architecture.md)** - Design principles and structure
- **[Examples](examples/)** - Working code samples
---
## Rust Rust CLI Usage
The Rust library includes a powerful CLI tool for working with WebNN graphs.
### Validate a Graph
```bash
cargo run -- examples/sample_graph.json
```
### Visualize a Graph
```bash
# Generate DOT file
cargo run -- examples/sample_graph.json --export-dot graph.dot
# Convert to PNG (requires graphviz)
dot -Tpng graph.dot -o graph.png
# Or use the Makefile shortcut (macOS)
make viz
```
### Convert to ONNX
```bash
cargo run -- examples/sample_graph.json \
--convert onnx \
--convert-output model.onnx
```
### Convert to CoreML
```bash
cargo run -- examples/sample_graph.json \
--convert coreml \
--convert-output model.mlmodel
```
### Execute Models
**ONNX Runtime** (cross-platform):
```bash
cargo run --features onnx-runtime -- \
examples/sample_graph.json \
--convert onnx \
--run-onnx
```
**CoreML Runtime** (macOS only):
```bash
cargo run --features coreml-runtime -- \
examples/sample_graph.json \
--convert coreml \
--run-coreml \
--device gpu # or 'cpu', 'ane' for Neural Engine
```
### Makefile Targets
```bash
make help # Show all available targets
make build # Build Rust project
make test # Run Rust tests
make python-dev # Install Python package in dev mode
make python-test # Run Python tests
make docs-serve # Serve documentation locally
make validate-all-env # Run full test pipeline
```
---
## Architecture
**Design Principles:**
- **Backend-Agnostic Graphs** - Platform-independent representation, runtime backend selection
- **WebNN Spec Compliance** - Implements W3C Device Selection and MLTensor specs
- **Rust-First** - Pure Rust core with thin Python bindings
- **Lazy Conversion** - Backend conversion happens during execution, not compilation
See **[Architecture Guide](docs/architecture.md)** for details.
---
## Development
```bash
# Clone and build
git clone https://github.com/tarekziade/rustnn.git
cd rustnn
cargo build --release
maturin develop --features python
# Run tests
cargo test && python -m pytest tests/
```
See **[Development Guide](docs/development.md)** for detailed instructions.
---
## 🧪 Testing
### Python Tests
```bash
# Install test dependencies
pip install -e ".[dev]"
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_python_api.py -v
# Run integration tests with cleanup
python tests/test_integration.py --cleanup
```
### Rust Tests
```bash
# All tests
cargo test
# Specific module
cargo test converters
# With features
cargo test --features onnx-runtime,coreml-runtime
```
---
## Project Status
**[SUCCESS] 85 WebNN operations fully implemented across all backends!**
- [OK] W3C WebNN API implementation in Python
- [OK] Runtime backend selection (CPU, GPU, Neural Engine)
- [OK] 85/95 WebNN operations (89% spec coverage)
- [OK] ONNX Runtime execution (cross-platform)
- [OK] CoreML execution (macOS GPU/Neural Engine)
- [OK] Async execution with MLTensor management
- [OK] Shape inference with NumPy-style broadcasting
- [OK] Complete MobileNetV2 + Transformer examples
See [docs/operator-status.md](docs/operator-status.md) for complete implementation details.
---
## 🤝 Contributing
Contributions are welcome! Please see:
- [AGENTS.md](AGENTS.md) - Project architecture and conventions for AI agents
- [TODO.txt](TODO.txt) - Feature requests and known limitations
### Quick Contribution Guide
1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
3. **Install git hooks** (optional but recommended):
```bash
./scripts/install-git-hooks.sh
```
This installs a pre-commit hook that automatically checks code formatting before each commit.
4. Make your changes
5. Run tests: `cargo test && pytest tests/`
6. Format code: `cargo fmt` (or let the pre-commit hook handle it)
7. Commit: `git commit -m "Add my feature"`
8. Push and create a pull request
**Note:** The pre-commit hook will prevent commits with unformatted code. If needed, you can bypass it with `git commit --no-verify`, but this is not recommended.
---
## License
Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.
---
## Links
- **GitHub**: [https://github.com/tarekziade/rustnn](https://github.com/tarekziade/rustnn)
- **PyPI**: [https://pypi.org/project/pywebnn/](https://pypi.org/project/pywebnn/)
- **Documentation**: [https://tarekziade.github.io/rustnn/](https://tarekziade.github.io/rustnn/)
- **W3C WebNN Spec**: [https://www.w3.org/TR/webnn/](https://www.w3.org/TR/webnn/)
- **Issues**: [https://github.com/tarekziade/rustnn/issues](https://github.com/tarekziade/rustnn/issues)
---
## Acknowledgments
- W3C WebNN Community Group for the specification
- Chromium WebNN implementation for reference
- PyO3 project for excellent Python-Rust bindings
- Maturin for seamless Python package building
---
**Made with by [Tarek Ziade](https://github.com/tarekziade)**