1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
//! rten is an inference runtime for machine learning models.
//!
//! It enables you to take machine learning models trained using PyTorch
//! or other frameworks and run them in Rust.
//!
//! # Preparing models
//!
//! To use a model trained with a framework such as
//! [PyTorch](https://pytorch.org), it needs to first be exported into
//! [ONNX](https://onnx.ai) format. There are several ways to obtain models
//! in this format:
//!
//! - The model authors may already provide the model in ONNX
//! format. On [Hugging Face](https://huggingface.co/) you can find models
//! available in ONNX format by searching for the [ONNX
//! tag](https://huggingface.co/models?library=onnx&sort=trending).
//!
//! - Hugging Face provides a tool called
//! [Optimum](https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model)
//! which takes as input a Hugging Face model repository URL and exports an
//! ONNX model. This is a convenient way to export many popular pre-trained
//! models to ONNX format.
//!
//! - PyTorch has built-in [ONNX export functions](https://docs.pytorch.org/tutorials/beginner/onnx/export_simple_model_to_onnx_tutorial.html).
//! This can be used to convert custom models or any other model which is not
//! available in ONNX format via another means.
//!
//! RTen can load and run ONNX models directly, but it also supports a custom
//! [`.rten` file format][rten_format]. Models can be converted from ONNX to
//! this format via [rten-convert](https://pypi.org/project/rten-convert/). The
//! `.rten` format can be faster to load and supports large (> 2GB) models in a
//! single file, whereas ONNX models of this size must use external files for
//! weights. It is recommended to start with the ONNX format and consider
//! `.rten` later if you need these benefits.
//!
//! See the [model formats][model_formats] documentation for more details on
//! the format differences.
//!
//! # Loading and running models
//!
//! The basic workflow for loading and running a model is:
//!
//! 1. Load the model using [`Model::load_file`] or [`Model::load_mmap`].
//! 2. Load the input data (images, audio, text etc.)
//! 3. Pre-process the input data to convert it into tensors in the format the
//! model expects. For this you can use RTen's own tensor types (see
//! [rten-tensor](rten_tensor)) or
//! [ndarray](https://docs.rs/ndarray/latest/ndarray/#).
//!
//! If using ndarray, you will need to convert to RTen tensor types before
//! running the model and convert the output back to ndarray types
//! afterwards. See
//! [rten-ndarray-demo](https://github.com/robertknight/rten-ndarray-demo)
//! for an example.
//!
//! 4. Execute the model using [`Model::run`]
//! 5. Post-process the results to convert them into meaningful outputs.
//!
//! See the example projects in [rten-examples][rten_examples] to see how all
//! these pieces fit together.
//!
//! ## Threading
//!
//! RTen automatically executes models using multiple threads. For this purpose
//! it creates its own Rayon
//! [ThreadPool](https://docs.rs/rayon/latest/rayon/struct.ThreadPool.html)
//! which is sized to match the number of physical cores. You can access this
//! pool using [threading::thread_pool] if you want to run your own tasks in
//! this pool.
//!
//! # Supported models and hardware
//!
//! ## Hardware
//!
//! RTen currently executes models on the CPU. It can build for most
//! architectures that the Rust compiler supports. SIMD acceleration is
//! available for x86-64, Arm 64 and WebAssembly.
//!
//! ## Data types
//!
//! RTen supports tensors with the following data types:
//!
//! - `f32`, `i32`, `i8`, `u8`
//! - `i64` and `bool` tensors are supported by converting them to `i32`
//! tensors, on the assumption that the values in `i64` tensors will be in the
//! `i32` range. When preparing model inputs that expect these data types in
//! ONNX, you will need to convert them to `i32`.
//! - `f64` tensors are supported by converting them to `f32`.
//!
//! Some operators support a more limited set of data types than described in
//! the ONNX specification. Please file an issue if you need an operator to
//! support additional data types.
//!
//! Support for additional types (eg. `f16`, `bf16`) is planned for the
//! future.
//!
//! ## Supported operators
//!
//! RTen supports most ONNX operators. See the [tracking
//! issue](https://github.com/robertknight/rten/issues/14) for details.
//!
//! Some operators require additional dependencies and are only available if
//! certain crate features are enabled:
//!
//! - The `fft` feature enables operators related to the Fast Fourier Transform
//! (eg. STFT) using [rustfft](https://docs.rs/crate/rustfft).
//! - The `random` feature enables operators that generate random numbers (eg.
//! `RandomUniform`) using [fastrand](https://docs.rs/crate/fastrand).
//!
//! As a convenience, the `all-ops` feature enables all of the above features.
//!
//! ## Quantized models
//!
//! RTen supports quantized models where activations are in uint8 format and
//! weights are in int8 format. This combination is the default when an ONNX
//! model is quantized using [dynamic
//! quantization](https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#dynamic-quantization).
//! The `tools/ort-quantize.py` script in the RTen repository can be used to
//! quantize an existing model with float tensors into this format.
//!
//! See the [quantization
//! guide](https://github.com/robertknight/rten/blob/main/docs/quantization.md)
//! for a tutorial on how to quantize models and more information about
//! quantization in ONNX and the nuances of quantization support in RTen.
//!
//! # Inspecting models
//!
//! The [rten-cli](https://crates.io/crates/rten-cli) tool can be used to query
//! basic information about a `.rten` or `.onnx` model, such as the inputs and
//! outputs. It can also be used to test model compatibility and inference
//! performance by running models with randomly generated inputs.
//!
//! To examine a `.onnx` model in more detail, the [Netron](https://netron.app/)
//! application is very useful. It shows the complete model graph and enables
//! inspecting individual nodes.
//!
//! # Performance
//!
//! See the [performance
//! guide](https://github.com/robertknight/rten/blob/main/docs/performance.md) for
//! information on profiling and improving model execution performance.
//!
//! # Crate features
//!
//! - **all-ops** - Enables all operators which are not enabled by default
//! - **fft** - Enables FFT operators
//! - **mmap** - Enable loading models with memory mapping via [`Model::load_mmap`]
//! - **onnx_format** (enabled by default) - Enables support for loading `.onnx` models.
//! - **random** - Enables operators that generate random numbers
//! - **rten_format** (enabled by default) - Enables support for loading `.rten` models.
//! - **wasm_api** - Generate WebAssembly API using wasm-bindgen
//!
//! At least one of the **onnx_format** or **rten_format** features must be enabled.
//!
//! [model_formats]: https://github.com/robertknight/rten/blob/main/docs/model-formats.md
//! [onnx_operators]: https://onnx.ai/onnx/operators/
//! [rten_examples]: https://github.com/robertknight/rten/tree/main/rten-examples
//! [rten_format]: https://github.com/robertknight/rten/blob/main/docs/rten-file-format.md
//! [schema_fbs]: https://github.com/robertknight/rten/blob/main/src/schema.fbs
// Docs only
use ;
// Temporarily included in this crate. These functions should be moved into
// a separate crate in future.
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use TimingSort;
pub use ;
pub type ModelLoadError = LoadError;