ort-web 0.1.0+1.23

ONNX Runtime on the web 🌐 - An alternative backend for ort
Documentation

ort-web is an [ort] backend that enables the usage of ONNX Runtime in the web.

Usage

CORS

ort-web dynamically fetches the required scripts & WASM binary at runtime. By default, it will fetch the build from the cdn.pyke.io domain, so make sure it is accessible via CORS if you have that configured.

You can also use a self-hosted build with [Dist]; see the api function for an example. The scripts & binary can be acquired from the dist folder of the onnxruntime-web npm package.

Telemetry

ort-web collects telemetry data by default and sends it to signal.pyke.io. This telemetry data helps us understand how ort-web is being used so we can improve it. Zero PII is collected; you can see what is sent in _telemetry.js. If you wish to contribute telemetry data, please allowlist signal.pyke.io; otherwise, it can be disabled via EnvironmentBuilder::with_telemetry.

Initialization

ort must have the alternative-backend feature enabled, as this enables the usage of [ort::set_api].

You can choose which build of ONNX Runtime to fetch by choosing any combination of these 3 feature flags: [FEATURE_WEBGL], [FEATURE_WEBGPU], [FEATURE_WEBNN]. These enable the usage of the [WebGL][ort::ep::WebGL], [WebGPU][ort::ep::WebGPU], and [WebNN][ort::ep::WebNN] EPs respectively. You can | features together to enable multiple at once:

use ort_web::{FEATURE_WEBGL, FEATURE_WEBGPU};
ort::set_api(ort_web::api(FEATURE_WEBGL | FEATURE_WEBGPU).await?);

You'll still need to configure the EPs on a per-session basis later like you would normally, but this allows you to e.g. only fetch the CPU build if the user doesn't have hardware acceleration.

Session creation

Sessions can only be created from a URL, or indirectly from memory - that means no SessionBuilder::commit_from_memory_directly for .ort format models, and no SessionBuilder::commit_from_file.

The remaining commit functions - SessionBuilder::commit_from_url and SessionBuilder::commit_from_memory are marked async and need to be awaited. commit_from_url is always available when targeting WASM and does not require the fetch-models feature flag to be enabled for ort.

Inference

Only Session::run_async is supported; Session::run will always throw an error.

Inference outputs are not synchronized by default (see the next section). If you need access to the data of all session outputs from Rust, the [sync_outputs] function can be used to sync them all at once.

Synchronization

ONNX Runtime is loaded as a separate WASM module, and ort-web acts as an intermediary between the two. There is no mechanism in WASM for two modules to share memory, so tensors often need to be 'synchronized' when one side needs to see data from the other.

Tensor::new should never be used for creating inputs, as they start out allocated on the ONNX Runtime side, thus requiring a sync (of empty data) to Rust before it can be written to. Prefer instead Tensor::from_array/ TensorRef::from_array_view, as tensors created this way never require synchronization.

As previously stated, session outputs are not synchronized. If you wish to use their data in Rust, you must either sync all outputs at once with [sync_outputs], or sync each tensor at a time (if you only use a few outputs):

use ort_web::{TensorExt, SyncDirection};

let mut outputs = session.run_async(ort::inputs![...]).await?;

let mut bounding_boxes = outputs.remove("bounding_boxes").unwrap();
bounding_boxes.sync(SyncDirection::Rust).await?;

// now we can use the data
let data = bounding_boxes.try_extract_tensor::<f32>()?;

Once a session output is synced, that tensor becomes backed by a Rust buffer. Updates to the tensor's data from the Rust side will not reflect in ONNX Runtime until the tensor is synced with SyncDirection::Runtime. Likewise, updates to the tensor's data from ONNX Runtime won't reflect in Rust until Rust syncs that tensor with SyncDirection::Rust. You don't have to worry about this behavior if you only ever read from session outputs, though.

Limitations