rustnn / PyWebNN
A Rust implementation of WebNN graph handling with Python bindings that implement the W3C WebNN API specification.
[WARNING] EXPERIMENTAL - DO NOT USE IN PRODUCTION
This project is a proof-of-concept and experimental implementation. It is NOT ready for production use.
This is an early-stage experiment to explore WebNN graph handling and format conversion. Many features are incomplete, untested, or may change significantly. Use at your own risk for research and experimentation only.
Features:
- Rust Rust Library: Validates WebNN graphs and converts to ONNX/CoreML formats
- Python Python API: Complete W3C WebNN API implementation via PyO3 bindings
- [TARGET] Runtime Backend Selection: Choose CPU, GPU, or NPU execution at context creation
- [STATS] Format Conversion: Export graphs to ONNX (cross-platform) and CoreML (macOS)
- [DEPLOY] Model Execution: Run converted models on CPU, GPU, and Neural Engine (macOS)
- [FAST] Async Support: Non-blocking execution with Python asyncio integration
- [SEARCH] Graph Visualization: Generate Graphviz diagrams of your neural networks
- [OK] Validation: Comprehensive graph validation matching Chromium's WebNN implementation
- [MATH] Shape Inference: Automatic shape computation with NumPy-style broadcasting
- [STYLE] Real Examples: Complete 106-layer MobileNetV2 achieving 99.60% accuracy + Transformer text generation with attention
[PACKAGE] Installation
Python Package (PyWebNN)
Quick Start (Validation & Conversion Only):
This installs the base package for graph validation and format conversion (no execution).
For Full Execution Support:
To execute neural networks, you need ONNX Runtime:
# Install PyWebNN + ONNX Runtime for CPU execution
# Or for GPU execution (requires CUDA)
Note: The PyPI package currently includes validation and conversion features. ONNX Runtime execution requires the onnxruntime package to be installed separately. We're working on better integration in future releases.
Build from Source (with Execution Built-in):
For a fully integrated package with execution support:
# Clone the repository
# Install with ONNX Runtime support (recommended)
# Or manually with maturin
# macOS only: Add CoreML support
Requirements:
- Python 3.11+
- NumPy 1.20+
- ONNX Runtime 1.23+ (for execution)
Rust Library
Add to your Cargo.toml:
[]
= "0.1"
Or use directly from this repository.
[DEPLOY] Quick Start
Python API
# Create ML context - use hints for device selection
=
= # CPU-only execution
# Or: context = ml.create_context(accelerated=True) # Request GPU/NPU if available
# Create graph builder
=
# Define a simple graph: z = relu(x + y)
=
=
=
=
# Compile the graph (creates backend-agnostic representation)
=
# Prepare input data
=
=
# Execute: converts to backend-specific format and runs
=
# Actual computed values from ONNX Runtime
# Optional: Export the ONNX model to file (for deployment, inspection, etc.)
Backend Selection
Following the W3C WebNN Device Selection spec, device selection uses hints rather than explicit device types:
# Request GPU/NPU acceleration (default)
=
# Check if acceleration is available
# Request low-power execution (prefers NPU over GPU)
=
# Request high-performance execution (prefers GPU)
=
# CPU-only execution (no acceleration)
=
Device Selection Logic:
accelerated=True+power_preference="low-power"โ NPU > GPU > CPUaccelerated=True+power_preference="high-performance"โ GPU > NPU > CPUaccelerated=True+power_preference="default"โ GPU > NPU > CPUaccelerated=Falseโ CPU only
Platform-Specific Backends:
- NPU: CoreML Neural Engine (Apple Silicon macOS only)
- GPU: ONNX Runtime GPU (cross-platform) or CoreML GPU (macOS)
- CPU: ONNX Runtime CPU (cross-platform)
Important: The accelerated property indicates platform capability, not a guarantee. Query context.accelerated after creation to check if GPU/NPU resources are available. The platform controls actual device allocation based on runtime conditions.
The graph compilation (builder.build()) creates a backend-agnostic representation. Backend-specific conversion happens automatically during compute() based on the context's selected backend.
Async Execution
WebNN supports asynchronous execution following the W3C specification. Use AsyncMLContext for non-blocking operations:
# Create context
=
=
=
# Build graph
=
=
=
=
=
=
# Async dispatch (non-blocking execution)
=
=
await
Rust Library
use ;
use ;
// Load graph from JSON
let graph_info: GraphInfo = from_str?;
// Validate the graph
let validator = new;
let artifacts = validator.validate?;
// Convert to ONNX
let mut registry = new;
registry.register;
let converted = registry.convert?;
// Save to file
write?;
// Execute with ONNX Runtime (requires "onnx-runtime" feature)
// Execute with CoreML (requires "coreml-runtime" feature, macOS only)
[STYLE] Examples
Real Image Classification with Complete Pretrained MobileNetV2
The examples/mobilenetv2_complete.py demonstrates real image classification using the complete 106-layer pretrained MobileNetV2 from the WebNN test-data repository:
# Download all 106 pretrained weight files (first time only)
# Run with CPU backend
# Run with GPU backend
# Run with CoreML backend (macOS only - fastest!)
Sample Output (classifying a red panda):
======================================================================
Complete MobileNetV2 Image Classification with WebNN
======================================================================
Image: examples/images/test.jpg
Backend: ONNX CPU
Loading all pretrained MobileNetV2 weights...
[OK] Loaded 106 weight tensors
Weight load time: 22.79ms
Building complete MobileNetV2 graph...
Layer 0: Initial conv 3->32
Block 0: 32->16 (stride=1, expansion=1)
Block 1: 16->24 (stride=2, expansion=6)
...
Block 16: 160->320 (stride=1, expansion=6)
Layer final: Conv 320->1280
[OK] Complete MobileNetV2 graph built!
Graph build time: 913.78ms
Top 5 Predictions (Real ImageNet Labels):
----------------------------------------------------------------------
1. lesser panda 99.60%
2. polecat 0.20%
3. weasel 0.09%
4. black-footed ferret 0.02%
5. kit fox 0.01%
Performance Summary:
- Weight Load: 22.79ms
- Preprocessing: 15.52ms
- Graph Build: 913.78ms
- Inference: 74.41ms (CPU) / 77.14ms (GPU) / 51.93ms (CoreML)
======================================================================
How It Works:
- Complete 106-layer architecture - All pretrained weights from WebNN test-data
- 17 inverted residual blocks - Full MobileNetV2 architecture
- Built with WebNN operations - Uses conv2d, add, clamp, global_average_pool, gemm, softmax
- Real ImageNet-1000 labels - Accurate real-world predictions
- Three backend support - ONNX CPU, ONNX GPU, CoreML (Neural Engine on Apple Silicon)
- Production-quality accuracy - 99.60% confidence on correct class
Architecture Details:
- Initial conv: 3โ32 channels (stride 2)
- 17 inverted residual blocks with varying expansions (1x or 6x)
- Depthwise separable convolutions using groups parameter
- Residual connections for stride=1 blocks
- ReLU6 activations (clamp 0-6)
- Final conv: 320โ1280 channels
- Global average pooling + classifier (1280โ1000)
This implementation exactly matches the JavaScript WebNN demos, building the complete graph layer-by-layer using WebNN API operations.
Text Generation with Transformer Attention
The examples/text_generation_gpt.py demonstrates next-token generation using a simplified transformer with attention, similar to the JavaScript WebNN text generation demo:
# Run basic generation on all 3 backends
# Or run on a specific backend
# Train the model on sample data
# Generate with trained weights
# Run enhanced version with KV cache
Sample Output:
======================================================================
Next-Token Generation with Attention (WebNN)
======================================================================
Backend: ONNX CPU
Model: vocab=256 (byte-level), d_model=64, max_seq=32
[OK] Context created (accelerated=False)
[OK] Model initialized
Prompt: 'Hello world'
Prompt tokens (11): [72, 101, 108, 108, 111, 32, 119, 111, 114, 108]...
Generating 30 tokens autoregressively...
======================================================================
Token 1/30: 87 (prob: 0.0042)
Token 10/30: 123 (prob: 0.0043)
Token 20/30: 136 (prob: 0.0037)
Token 30/30: 99 (prob: 0.0040)
======================================================================
WebNN Operations Demonstrated:
[OK] matmul - Matrix multiplication for projections
[OK] layer_normalization - Normalizing activations
[OK] relu - Activation function
[OK] softmax - Output probability distribution
[OK] reduce_mean - Simplified attention pooling
[OK] gemm - General matrix multiply with transpose
======================================================================
How It Works:
- Transformer architecture - Single-head attention, layer normalization, feed-forward networks
- Autoregressive generation - Generates one token at a time based on context
- Positional embeddings - Sinusoidal position encodings
- Temperature sampling - Configurable randomness in token selection
- Training support - Train on custom text with
train_text_model.py - KV caching - Enhanced version with efficient key-value caching
- Three backend support - ONNX CPU, ONNX GPU, CoreML (Neural Engine on Apple Silicon)
Complete Workflow:
# 1. Train on sample data (10 epochs, ~1-2 minutes)
# 2. Generate with trained weights (better quality)
# 3. Or use enhanced version with KV cache
The training script (examples/train_text_model.py) uses simple gradient descent to train on text data, and the enhanced version (examples/text_generation_enhanced.py) includes KV caching for efficient generation and HuggingFace tokenizer support.
Additional Examples
examples/python_simple.py- Basic graph building and executionexamples/python_matmul.py- Matrix multiplication operationsexamples/image_classification.py- Full classification pipeline (random weights)
See the examples/ directory for more code samples.
Documentation
The Python API implements the W3C WebNN specification.
Quick Links:
- API Reference - Complete Python API documentation
- Getting Started - Installation and first steps
- Architecture - Design principles and structure
- Examples - Working code samples
Rust Rust CLI Usage
The Rust library includes a powerful CLI tool for working with WebNN graphs.
Validate a Graph
Visualize a Graph
# Generate DOT file
# Convert to PNG (requires graphviz)
# Or use the Makefile shortcut (macOS)
Convert to ONNX
Convert to CoreML
Execute Models
ONNX Runtime (cross-platform):
CoreML Runtime (macOS only):
Makefile Targets
Architecture
Design Principles:
- Backend-Agnostic Graphs - Platform-independent representation, runtime backend selection
- WebNN Spec Compliance - Implements W3C Device Selection and MLTensor specs
- Rust-First - Pure Rust core with thin Python bindings
- Lazy Conversion - Backend conversion happens during execution, not compilation
See Architecture Guide for details.
Development
# Clone and build
# Run tests
&&
See Development Guide for detailed instructions.
๐งช Testing
Python Tests
# Install test dependencies
# Run all tests
# Run specific test file
# Run integration tests with cleanup
Rust Tests
# All tests
# Specific module
# With features
Project Status
[SUCCESS] 85 WebNN operations fully implemented across all backends!
- [OK] W3C WebNN API implementation in Python
- [OK] Runtime backend selection (CPU, GPU, Neural Engine)
- [OK] 85/95 WebNN operations (89% spec coverage)
- [OK] ONNX Runtime execution (cross-platform)
- [OK] CoreML execution (macOS GPU/Neural Engine)
- [OK] Async execution with MLTensor management
- [OK] Shape inference with NumPy-style broadcasting
- [OK] Complete MobileNetV2 + Transformer examples
See docs/operator-status.md for complete implementation details.
๐ค Contributing
Contributions are welcome! Please see:
- AGENTS.md - Project architecture and conventions for AI agents
- TODO.txt - Feature requests and known limitations
Quick Contribution Guide
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Install git hooks (optional but recommended):
This installs a pre-commit hook that automatically checks code formatting before each commit. - Make your changes
- Run tests:
cargo test && pytest tests/ - Format code:
cargo fmt(or let the pre-commit hook handle it) - Commit:
git commit -m "Add my feature" - Push and create a pull request
Note: The pre-commit hook will prevent commits with unformatted code. If needed, you can bypass it with git commit --no-verify, but this is not recommended.
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Links
- GitHub: https://github.com/tarekziade/rustnn
- PyPI: https://pypi.org/project/pywebnn/
- Documentation: https://tarekziade.github.io/rustnn/
- W3C WebNN Spec: https://www.w3.org/TR/webnn/
- Issues: https://github.com/tarekziade/rustnn/issues
Acknowledgments
- W3C WebNN Community Group for the specification
- Chromium WebNN implementation for reference
- PyO3 project for excellent Python-Rust bindings
- Maturin for seamless Python package building
Made with by Tarek Ziade