Crate rten

Crate rten 

Source
Expand description

rten is an inference runtime for machine learning models.

It enables you to take machine learning models trained using PyTorch or other frameworks and run them in Rust.

§Preparing models

To use a model trained with a framework such as PyTorch, it needs to first be exported into ONNX format. There are several ways to obtain models in this format:

  • The model authors may already provide the model in ONNX format. On Hugging Face you can find models available in ONNX format by searching for the ONNX tag.

  • Hugging Face provides a tool called Optimum which takes as input a Hugging Face model repository URL and exports an ONNX model. This is a convenient way to export many popular pre-trained models to ONNX format.

  • PyTorch has built-in ONNX export functions. This can be used to convert custom models or any other model which is not available in ONNX format via another means.

RTen can load and run ONNX models directly, but it also supports a custom .rten file format. Models can be converted from ONNX to this format via rten-convert. The .rten format can be faster to load and supports large (> 2GB) models in a single file, whereas ONNX models of this size must use external files for weights. It is recommended to start with the ONNX format and consider .rten later if you need these benefits.

See the model formats documentation for more details on the format differences.

§Loading and running models

The basic workflow for loading and running a model is:

  1. Load the model using Model::load_file or Model::load_mmap.

  2. Load the input data (images, audio, text etc.)

  3. Pre-process the input data to convert it into tensors in the format the model expects. For this you can use RTen’s own tensor types (see rten-tensor) or ndarray.

    If using ndarray, you will need to convert to RTen tensor types before running the model and convert the output back to ndarray types afterwards. See rten-ndarray-demo for an example.

  4. Execute the model using Model::run

  5. Post-process the results to convert them into meaningful outputs.

See the example projects in rten-examples to see how all these pieces fit together.

§Threading

RTen automatically executes models using multiple threads. For this purpose it creates its own Rayon ThreadPool which is sized to match the number of physical cores. You can access this pool using threading::thread_pool if you want to run your own tasks in this pool.

§Supported models and hardware

§Hardware

RTen currently executes models on the CPU. It can build for most architectures that the Rust compiler supports. SIMD acceleration is available for x86-64, Arm 64 and WebAssembly.

§Data types

RTen supports tensors with the following data types:

  • f32, i32, i8, u8
  • i64 and bool tensors are supported by converting them to i32 tensors, on the assumption that the values in i64 tensors will be in the i32 range. When preparing model inputs that expect these data types in ONNX, you will need to convert them to i32.
  • f64 tensors are supported by converting them to f32.

Some operators support a more limited set of data types than described in the ONNX specification. Please file an issue if you need an operator to support additional data types.

Support for additional types (eg. f16, bf16) is planned for the future.

§Supported operators

RTen supports most ONNX operators. See the tracking issue for details.

Some operators require additional dependencies and are only available if certain crate features are enabled:

  • The fft feature enables operators related to the Fast Fourier Transform (eg. STFT) using rustfft.
  • The random feature enables operators that generate random numbers (eg. RandomUniform) using fastrand.

As a convenience, the all-ops feature enables all of the above features.

§Quantized models

RTen supports quantized models where activations are in uint8 format and weights are in int8 format. This combination is the default when an ONNX model is quantized using dynamic quantization. The tools/ort-quantize.py script in the RTen repository can be used to quantize an existing model with float tensors into this format.

See the quantization guide for a tutorial on how to quantize models and more information about quantization in ONNX and the nuances of quantization support in RTen.

§Inspecting models

The rten-cli tool can be used to query basic information about a .rten or .onnx model, such as the inputs and outputs. It can also be used to test model compatibility and inference performance by running models with randomly generated inputs.

To examine a .onnx model in more detail, the Netron application is very useful. It shows the complete model graph and enables inspecting individual nodes.

§Performance

See the performance guide for information on profiling and improving model execution performance.

§Crate features

  • all-ops - Enables all operators which are not enabled by default
  • fft - Enables FFT operators
  • mmap - Enable loading models with memory mapping via Model::load_mmap
  • onnx_format (enabled by default) - Enables support for loading .onnx models.
  • random - Enables operators that generate random numbers
  • rten_format (enabled by default) - Enables support for loading .rten models.
  • wasm_api - Generate WebAssembly API using wasm-bindgen

At least one of the onnx_format or rten_format features must be enabled.

Re-exports§

pub use ops::FloatOperators;
pub use ops::Operators;

Modules§

ctc
Connectionist Temporal Classification (CTC) sequence decoding tools.
op_types
Types that can be used with OpRegistry::with_ops.
ops
The ops module exposes the various operators available for machine-learning models.

Macros§

op_registry
Construct an OpRegistry with a given set of operators enabled.

Structs§

BufferPool
A pool which enables reuse of data buffers from tensors and other containers.
LoadError
Errors that occur when loading a model.
Model
The central type used to execute machine learning models.
ModelMetadata
Collection of (name, value) metadata entries for a model.
ModelOptions
Options which customize how a model is loaded.
NodeId
ID of a node in a Model graph.
NodeInfo
Provides access to metadata about a graph node.
OpRegistry
Registry used to deserialize operators when loading a model.
PoolRef
A smart pointer which wraps a tensor or other container and returns it to a pool when dropped.
RunError
Errors that occur when running a model.
RunOptions
Options that control logging and other behaviors when executing a Model.
ThreadPool
A wrapper around the Rayon thread pool used to run models.

Enums§

DataType
Element type of a tensor.
Dimension
Represents the size of a dimension of a runtime-provided value, such as an operator input, output or intermediate value.
LoadErrorKind
Categories of error when loading a model.
RunErrorKind
The category of model execution error. See RunError::kind.
Sequence
A list of tensors.
TimingSort
Specifies sort order for graph run timings.
TryFromValueError
Errors when converting a Value or ValueView to a tensor of a specific type and/or rank.
Value
Owned value type used for model inputs and outputs.
ValueOrView
Value type used for model inputs that can be either owned or borrowed.
ValueType
Collection and element type of a value.
ValueView
Borrowed value type used for model inputs.

Traits§

ExtractBuffer
Trait for extracting the data buffer from a tensor or other container.
RegisterOp
Enable an operator in a registry. See OpRegistry.

Functions§

thread_pool
Return the Rayon thread pool which is used to execute RTen models.

Type Aliases§

ModelLoadErrorDeprecated