Expand description
rten is an inference runtime for machine learning models.
It enables you to take machine learning models trained using PyTorch or other frameworks and run them in Rust.
§Preparing models
To use a model trained with a framework such as PyTorch, it needs to first be exported into ONNX format. There are several ways to obtain models in this format:
-
The model authors may already provide the model in ONNX format. On Hugging Face you can find models available in ONNX format by searching for the ONNX tag.
-
Hugging Face provides a tool called Optimum which takes as input a Hugging Face model repository URL and exports an ONNX model. This is a convenient way to export many popular pre-trained models to ONNX format.
-
PyTorch has built-in ONNX export functions. This can be used to convert custom models or any other model which is not available in ONNX format via another means.
RTen can load and run ONNX models directly, but it also supports a custom
.rten file format. Models can be converted from ONNX to
this format via rten-convert. The
.rten format can be faster to load and supports large (> 2GB) models in a
single file, whereas ONNX models of this size must use external files for
weights. It is recommended to start with the ONNX format and consider
.rten later if you need these benefits.
See the model formats documentation for more details on the format differences.
§Loading and running models
The basic workflow for loading and running a model is:
-
Load the model using
Model::load_fileorModel::load_mmap. -
Load the input data (images, audio, text etc.)
-
Pre-process the input data to convert it into tensors in the format the model expects. For this you can use RTen’s own tensor types (see rten-tensor) or ndarray.
If using ndarray, you will need to convert to RTen tensor types before running the model and convert the output back to ndarray types afterwards. See rten-ndarray-demo for an example.
-
Execute the model using
Model::run -
Post-process the results to convert them into meaningful outputs.
See the example projects in rten-examples to see how all these pieces fit together.
§Threading
RTen automatically executes models using multiple threads. For this purpose it creates its own Rayon ThreadPool which is sized to match the number of physical cores. You can access this pool using threading::thread_pool if you want to run your own tasks in this pool.
§Supported models and hardware
§Hardware
RTen currently executes models on the CPU. It can build for most architectures that the Rust compiler supports. SIMD acceleration is available for x86-64, Arm 64 and WebAssembly.
§Data types
RTen supports tensors with the following data types:
f32,i32,i8,u8i64andbooltensors are supported by converting them toi32tensors, on the assumption that the values ini64tensors will be in thei32range. When preparing model inputs that expect these data types in ONNX, you will need to convert them toi32.f64tensors are supported by converting them tof32.
Some operators support a more limited set of data types than described in the ONNX specification. Please file an issue if you need an operator to support additional data types.
Support for additional types (eg. f16, bf16) is planned for the
future.
§Supported operators
RTen supports most ONNX operators. See the tracking issue for details.
Some operators require additional dependencies and are only available if certain crate features are enabled:
- The
fftfeature enables operators related to the Fast Fourier Transform (eg. STFT) using rustfft. - The
randomfeature enables operators that generate random numbers (eg.RandomUniform) using fastrand.
As a convenience, the all-ops feature enables all of the above features.
§Quantized models
RTen supports quantized models where activations are in uint8 format and
weights are in int8 format. This combination is the default when an ONNX
model is quantized using dynamic
quantization.
The tools/ort-quantize.py script in the RTen repository can be used to
quantize an existing model with float tensors into this format.
See the quantization guide for a tutorial on how to quantize models and more information about quantization in ONNX and the nuances of quantization support in RTen.
§Inspecting models
The rten-cli tool can be used to query
basic information about a .rten or .onnx model, such as the inputs and
outputs. It can also be used to test model compatibility and inference
performance by running models with randomly generated inputs.
To examine a .onnx model in more detail, the Netron
application is very useful. It shows the complete model graph and enables
inspecting individual nodes.
§Performance
See the performance guide for information on profiling and improving model execution performance.
§Crate features
- all-ops - Enables all operators which are not enabled by default
- fft - Enables FFT operators
- mmap - Enable loading models with memory mapping via
Model::load_mmap - onnx_format (enabled by default) - Enables support for loading
.onnxmodels. - random - Enables operators that generate random numbers
- rten_format (enabled by default) - Enables support for loading
.rtenmodels. - wasm_api - Generate WebAssembly API using wasm-bindgen
At least one of the onnx_format or rten_format features must be enabled.
Re-exports§
pub use ops::FloatOperators;pub use ops::Operators;
Modules§
- ctc
- Connectionist Temporal Classification (CTC) sequence decoding tools.
- op_
types - Types that can be used with
OpRegistry::with_ops. - ops
- The
opsmodule exposes the various operators available for machine-learning models.
Macros§
- op_
registry - Construct an
OpRegistrywith a given set of operators enabled.
Structs§
- Buffer
Pool - A pool which enables reuse of data buffers from tensors and other containers.
- Load
Error - Errors that occur when loading a model.
- Model
- The central type used to execute machine learning models.
- Model
Metadata - Collection of (name, value) metadata entries for a model.
- Model
Options - Options which customize how a model is loaded.
- NodeId
- ID of a node in a
Modelgraph. - Node
Info - Provides access to metadata about a graph node.
- OpRegistry
- Registry used to deserialize operators when loading a model.
- PoolRef
- A smart pointer which wraps a tensor or other container and returns it to a pool when dropped.
- RunError
- Errors that occur when running a model.
- RunOptions
- Options that control logging and other behaviors when executing a
Model. - Thread
Pool - A wrapper around the Rayon thread pool used to run models.
Enums§
- Data
Type - Element type of a tensor.
- Dimension
- Represents the size of a dimension of a runtime-provided value, such as an operator input, output or intermediate value.
- Load
Error Kind - Categories of error when loading a model.
- RunError
Kind - The category of model execution error. See
RunError::kind. - Sequence
- A list of tensors.
- Timing
Sort - Specifies sort order for graph run timings.
- TryFrom
Value Error - Errors when converting a
ValueorValueViewto a tensor of a specific type and/or rank. - Value
- Owned value type used for model inputs and outputs.
- Value
OrView - Value type used for model inputs that can be either owned or borrowed.
- Value
Type - Collection and element type of a value.
- Value
View - Borrowed value type used for model inputs.
Traits§
- Extract
Buffer - Trait for extracting the data buffer from a tensor or other container.
- Register
Op - Enable an operator in a registry. See
OpRegistry.
Functions§
- thread_
pool - Return the Rayon thread pool which is used to execute RTen models.
Type Aliases§
- Model
Load Error Deprecated