Expand description
rten is a runtime for machine learning models.
§Preparing models
To use a model trained with a framework such as
PyTorch, it needs to first be exported into
ONNX format and then converted into .rten
format using
the rten-convert
tool. See the
rten model format docs for more details on the file format.
§Loading and running models
The basic workflow for loading and running a model is:
-
Load the model using
Model::load_file
. -
Load the input data (images, audio, text etc.)
-
Pre-process the input data to convert it into tensors in the format the model expects. For this you can use RTen’s own tensor types (see rten-tensor) or ndarray.
If using ndarray, you will need to convert to RTen tensor types before running the model and convert the output back to ndarray types afterwards. See rten-ndarray-demo for an example.
-
Execute the model using
Model::run
-
Post-process the results to convert them into meaningful outputs.
See the example projects in rten-examples to see how all these pieces fit together.
§Threading
RTen automatically executes models using multiple threads. For this purpose it creates its own Rayon ThreadPool which is sized to match the number of physical cores. You can access this pool using threading::thread_pool if you want to run your own tasks in this pool.
§Supported models and hardware
§Hardware
RTen currently executes models on the CPU. It can build for most architectures that the Rust compiler supports. SIMD acceleration is available for x86-64, Arm 64 and WebAssembly.
§Data types
RTen supports tensors with the following data types:
f32
,i32
,i8
,u8
i64
andbool
tensors are supported by converting them toi32
as part of the model conversion process. When preparing model inputs that expect these data types in ONNX, you will need to convert them toi32
.
Some operators support a more limited set of data types than described in the ONNX specification. Please file an issue if you need an operator to support additional data types.
Support for additional types (eg. f16
, bf16
) is planned for the
future.
§Supported operators
RTen supports most ONNX operators. See the tracking issue for details.
Some operators require additional dependencies and are only available if certain crate features are enabled:
- The
fft
feature enables operators related to the Fast Fourier Transform (eg. STFT) using rustfft. - The
random
feature enables operators that generate random numbers (eg.RandomUniform
) using fastrand.
As a convenience, the all-ops
feature enables all of the above features.
§Quantized models
RTen supports quantized models where activations are in uint8 format and
weights are in int8 format. This combination is the default when an ONNX
model is quantized using dynamic
quantization.
The tools/ort-quantize.py
script in the RTen repository can be used to
quantize an existing model with float tensors into this format.
See the quantization guide for a tutorial on how to quantize models and more information about quantization in ONNX and the nuances of quantization support in RTen.
§Inspecting models
The rten-cli tool can be used to query
basic information about a .rten
model, such as the inputs and outputs.
It can also be used to test model compatibility and inference performance
by running models with randomly generated inputs.
§Performance
See the performance guide for information on profiling and improving model execution performance.
Re-exports§
pub use ops::FloatOperators;
pub use ops::Operators;
pub use ops::Input;
Deprecated pub use ops::InputOrOutput;
Deprecated pub use ops::Output;
Deprecated
Modules§
- ctc
- Connectionist Temporal Classification (CTC) sequence decoding tools.
- ops
- The
ops
module exposes the various operators available for machine-learning models.
Structs§
- Buffer
Pool - A pool which enables reuse of data buffers from tensors and other containers.
- Model
- The central type used to execute RTen machine learning models.
- Model
Metadata - Metadata for an RTen model.
- Model
Options - Options which customize how a model is loaded.
- NodeId
- ID of a node in a
Model
graph. - Node
Info - Provides access to metadata about a graph node.
- OpRegistry
- Registry used to deserialize operators when loading a model.
- PoolRef
- A smart pointer which wraps a tensor or other container and returns it to a pool when dropped.
- RunOptions
- Options that control logging and other behaviors when executing a
Model
. - Thread
Pool - A wrapper around the Rayon thread pool used to run models.
Enums§
- Data
Type - Enum specifying the data type of a tensor.
- Dimension
- Represents the size of a dimension of a runtime-provided value, such as an operator input, output or intermediate value.
- Model
Load Error - Errors reported by
Model::load
. - Read
OpError - Error type for errors that occur when de-serializing an operator.
- RunError
- Reasons why a graph execution failed
- Sequence
- A list of tensors.
- Timing
Sort - Specifies sort order for graph run timings.
- Value
- An owned value that can be used as an operator input or output.
- Value
OrView - An owned or borrowed value that can be used as a model or operator input.
- Value
View - A borrowed value that can be used as a model or operator input.
Traits§
- Extract
Buffer - Trait for extracting the data buffer from a tensor or other container.
- ReadOp
- Trait that deserializes an operator from a
.rten
file into anOperator
implementation.
Functions§
- thread_
pool - Return the Rayon thread pool which is used to execute RTen models.