oxionnx 0.1.0

Pure Rust ONNX inference engine — zero C/C++ dependencies
Documentation

OxiONNX

Pure Rust ONNX Inference Engine -- Zero C/C++ Dependencies

CI Crates.io License

OxiONNX is a high-performance ONNX inference engine written in pure Rust. It supports 147 ONNX operators, GPU acceleration via wgpu, SIMD optimization, and runs on any platform including WebAssembly.

30,000+ lines of Rust | 590+ tests | 0 clippy warnings

Features

  • Pure Rust -- Zero C/C++/Fortran dependencies. Safe, portable, auditable.
  • 147 ONNX operators -- Math, NN, Conv, Shape, Indexing, Comparison, RNN, Attention, ML
  • GPU acceleration -- wgpu compute shaders for MatMul, Softmax, ReLU, etc.
  • SIMD optimization -- NEON (aarch64) and AVX2 (x86_64) for element-wise ops
  • Multi-dtype -- f32, f16, bf16, i8, i32, i64 with automatic type promotion
  • INT8 quantization -- Quantized MatMul with per-channel scale/zero-point
  • Mixed precision -- f16 activations with f32 accumulation
  • Graph optimization -- Constant folding, operator fusion, CSE, dead code elimination
  • Memory efficiency -- Arena allocator, buffer pooling, strided tensor views
  • Streaming inference -- Token-by-token generation for autoregressive models
  • Async execution -- Non-blocking inference via run_async()
  • Control flow -- If/Loop/Scan operators with nested subgraph execution
  • Model encryption -- AES-GCM encrypted model files
  • WebAssembly -- Run in the browser via wasm-bindgen
  • no_std -- Core types work without std (alloc only)
  • Session caching -- Save/load pre-optimized graphs to skip re-optimization

Quick Start

use oxionnx::{Session, Tensor};
use std::collections::HashMap;

// Load model
let session = Session::from_file("model.onnx".as_ref())?;

// Prepare input
let mut inputs = HashMap::new();
inputs.insert("input", Tensor::new(vec![1.0, 2.0, 3.0], vec![1, 3]));

// Run inference
let outputs = session.run(&inputs)?;
println!("{:?}", outputs);

Session Builder

use oxionnx::{Session, OptLevel};

let session = Session::builder()
    .with_optimization_level(OptLevel::All)
    .with_memory_pool(true)
    .with_parallel_execution(true)
    .with_profiling()
    .load("model.onnx".as_ref())?;

Supported Operators

OxiONNX implements 147 operators across the full ONNX specification:

Category Count Examples
Math 38 MatMul, Gemm, Add, Mul, Pow, Sqrt, Reduce*, Trig
Neural Network 35 Relu, Sigmoid, Softmax, LayerNorm, BatchNorm, GELU, SiLU
Convolution 8 Conv, ConvTranspose, MaxPool, AveragePool, GlobalAvgPool
Shape 14 Reshape, Transpose, Concat, Slice, Split, Flatten
Indexing 12 Gather, Scatter, Where, OneHot, Compress, Unique
Comparison 13 Equal, Greater, Less, And, Or, Not, IsInf, IsNaN
RNN/Attention 8 LSTM, GRU, Attention, MultiHeadAttention, RotaryEmbedding
ONNX-ML 12 LinearClassifier, TreeEnsemble, SVM, Normalizer, TfIdf
Control Flow 3 If, Loop, Scan
Quantization 4 QuantizeLinear, DequantizeLinear, QLinearMatMul, QLinearConv

Feature Flags

Feature Description
gpu GPU acceleration via wgpu
simd SIMD-accelerated element-wise ops
encryption AES-GCM model encryption
mmap Memory-mapped weight loading
wasm WebAssembly browser bindings

Architecture

oxionnx (root)           -- Session, optimizer, execution engine
  oxionnx-core           -- Tensor, DType, Graph, Operator trait, OnnxError
  oxionnx-ops            -- 147 operator implementations
  oxionnx-proto          -- Pure Rust ONNX protobuf parser
  oxionnx-gpu            -- wgpu compute backend (optional)

License

Apache-2.0

Author

COOLJAPAN OU (Team Kitasan)