Crate rten

Crate rten 

Source
Expand description

rten is a runtime for machine learning models.

§Preparing models

To use a model trained with a framework such as PyTorch, it needs to first be exported into ONNX format and then converted into .rten format using the rten-convert tool. See the rten model format docs for more details on the file format.

§Loading and running models

The basic workflow for loading and running a model is:

  1. Load the model using Model::load_file.

  2. Load the input data (images, audio, text etc.)

  3. Pre-process the input data to convert it into tensors in the format the model expects. For this you can use RTen’s own tensor types (see rten-tensor) or ndarray.

    If using ndarray, you will need to convert to RTen tensor types before running the model and convert the output back to ndarray types afterwards. See rten-ndarray-demo for an example.

  4. Execute the model using Model::run

  5. Post-process the results to convert them into meaningful outputs.

See the example projects in rten-examples to see how all these pieces fit together.

§Threading

RTen automatically executes models using multiple threads. For this purpose it creates its own Rayon ThreadPool which is sized to match the number of physical cores. You can access this pool using threading::thread_pool if you want to run your own tasks in this pool.

§Supported models and hardware

§Hardware

RTen currently executes models on the CPU. It can build for most architectures that the Rust compiler supports. SIMD acceleration is available for x86-64, Arm 64 and WebAssembly.

§Data types

RTen supports tensors with the following data types:

  • f32, i32, i8, u8
  • i64 and bool tensors are supported by converting them to i32 as part of the model conversion process. When preparing model inputs that expect these data types in ONNX, you will need to convert them to i32.

Some operators support a more limited set of data types than described in the ONNX specification. Please file an issue if you need an operator to support additional data types.

Support for additional types (eg. f16, bf16) is planned for the future.

§Supported operators

RTen supports most ONNX operators. See the tracking issue for details.

Some operators require additional dependencies and are only available if certain crate features are enabled:

  • The fft feature enables operators related to the Fast Fourier Transform (eg. STFT) using rustfft.
  • The random feature enables operators that generate random numbers (eg. RandomUniform) using fastrand.

As a convenience, the all-ops feature enables all of the above features.

§Quantized models

RTen supports quantized models where activations are in uint8 format and weights are in int8 format. This combination is the default when an ONNX model is quantized using dynamic quantization. The tools/ort-quantize.py script in the RTen repository can be used to quantize an existing model with float tensors into this format.

See the quantization guide for a tutorial on how to quantize models and more information about quantization in ONNX and the nuances of quantization support in RTen.

§Inspecting models

The rten-cli tool can be used to query basic information about a .rten model, such as the inputs and outputs. It can also be used to test model compatibility and inference performance by running models with randomly generated inputs.

§Performance

See the performance guide for information on profiling and improving model execution performance.

Re-exports§

pub use ops::FloatOperators;
pub use ops::Operators;
pub use ops::Input;Deprecated
pub use ops::InputOrOutput;Deprecated
pub use ops::Output;Deprecated

Modules§

ctc
Connectionist Temporal Classification (CTC) sequence decoding tools.
ops
The ops module exposes the various operators available for machine-learning models.

Structs§

BufferPool
A pool which enables reuse of data buffers from tensors and other containers.
Model
The central type used to execute RTen machine learning models.
ModelMetadata
Metadata for an RTen model.
ModelOptions
Options which customize how a model is loaded.
NodeId
ID of a node in a Model graph.
NodeInfo
Provides access to metadata about a graph node.
OpRegistry
Registry used to deserialize operators when loading a model.
PoolRef
A smart pointer which wraps a tensor or other container and returns it to a pool when dropped.
RunOptions
Options that control logging and other behaviors when executing a Model.
ThreadPool
A wrapper around the Rayon thread pool used to run models.

Enums§

DataType
Enum specifying the data type of a tensor.
Dimension
Represents the size of a dimension of a runtime-provided value, such as an operator input, output or intermediate value.
ModelLoadError
Errors reported by Model::load.
ReadOpError
Error type for errors that occur when de-serializing an operator.
RunError
Reasons why a graph execution failed
Sequence
A list of tensors.
TimingSort
Specifies sort order for graph run timings.
Value
An owned value that can be used as an operator input or output.
ValueOrView
An owned or borrowed value that can be used as a model or operator input.
ValueView
A borrowed value that can be used as a model or operator input.

Traits§

ExtractBuffer
Trait for extracting the data buffer from a tensor or other container.
ReadOp
Trait that deserializes an operator from a .rten file into an Operator implementation.

Functions§

thread_pool
Return the Rayon thread pool which is used to execute RTen models.