pub struct RknnModel { /* private fields */ }Expand description
A loaded RKNN model ready for inference.
This is the main type you interact with. It holds the model, pre-allocated zero-copy memory buffers for input and outputs, and handles all communication with the NPU.
§Lifecycle
- Load a model with
loadorload_with_lib. This initializes the NPU context and allocates memory buffers. - Inspect tensor metadata via
input_attrandoutput_attrsto learn expected shapes and formats. - Run inference with
run- pass raw RGB bytes (NHWC, u8, no normalization). - Read results with
output_raw(zero-copy&[i8]) oroutput_f32(dequantizedVec<f32>).
§Example
use rknn_runtime::RknnModel;
let model = RknnModel::load("model.rknn")?;
// Check what the model expects
let input = model.input_attr();
// e.g. [1, 320, 320, 3]
println!("Input shape: {:?}", input.shape);
// Run inference
model.run(&rgb_bytes)?;
// Get raw INT8 output (zero-copy - no allocation, just a slice into NPU memory)
let raw = model.output_raw(0)?;
// Or get dequantized f32 output (allocates a new Vec)
let floats = model.output_f32(0)?;§Drop order
Internally, memory buffers are dropped before the RKNN context. This is handled automatically - you don’t need to worry about it.
Implementations§
Source§impl RknnModel
impl RknnModel
Sourcepub fn load(model_path: &str) -> Result<Self, Error>
pub fn load(model_path: &str) -> Result<Self, Error>
Load a .rknn model from a file.
Uses the default library path (/usr/lib/librknnmrt.so).
If your librknnmrt.so is elsewhere, use load_with_lib.
§Errors
Error::IoErrorif the file cannot be read.Error::LibraryNotFoundiflibrknnmrt.sois not found.Error::InitFailedif the NPU rejects the model.
Sourcepub fn load_with_lib(model_path: &str, lib_path: &str) -> Result<Self, Error>
pub fn load_with_lib(model_path: &str, lib_path: &str) -> Result<Self, Error>
Load a .rknn model from a file, using a custom library path.
let model = RknnModel::load_with_lib(
"model.rknn",
"/opt/rknn/lib/librknnmrt.so",
)?;Sourcepub fn load_from_bytes(model_data: &[u8], lib_path: &str) -> Result<Self, Error>
pub fn load_from_bytes(model_data: &[u8], lib_path: &str) -> Result<Self, Error>
Load a model from raw bytes already in memory.
Useful when the .rknn file is embedded in your binary or received
over the network.
Sourcepub fn input_attr(&self) -> &TensorAttr
pub fn input_attr(&self) -> &TensorAttr
Input tensor metadata (shape, format, data type).
The shape is typically [1, H, W, 3] (NHWC).
Use this to know what image size the model expects:
let input = model.input_attr();
let (h, w) = (input.shape[1], input.shape[2]);
println!("Model expects {}x{} RGB image", h, w);Sourcepub fn output_attrs(&self) -> &[TensorAttr]
pub fn output_attrs(&self) -> &[TensorAttr]
Output tensor metadata for all outputs.
Most models have a single output, but some could have several.
Each TensorAttr contains the shape, format, quantization zero-point
and scale - everything you need to decode the output.
Sourcepub fn run(&self, input: &[u8]) -> Result<(), Error>
pub fn run(&self, input: &[u8]) -> Result<(), Error>
Run inference on the NPU.
input must be raw RGB bytes in NHWC format ([1, H, W, 3]).
No normalization, no channel reordering - just plain u8 pixel values.
After this returns, read results with output_raw
or output_f32.
§What happens inside
- Copies
inputbytes into the pre-allocated NPU input buffer. - Calls
rknn_run()- the NPU executes the model. - Calls
rknn_mem_sync()on each output buffer (syncs NPU cache to CPU). In my case: this step is critical on RV1106 - without it, I get stale data.
Sourcepub fn output_raw(&self, index: usize) -> Result<&[i8], Error>
pub fn output_raw(&self, index: usize) -> Result<&[i8], Error>
Raw INT8 output data for the given output index.
Returns a slice pointing directly into the NPU’s zero-copy buffer. No allocation, no copying - this is as fast as it gets.
The data is in whatever layout the NPU uses (often NC1HWC2).
Use nc1hwc2_to_flat to convert it
to standard NCHW if needed.
§Errors
Returns Error::InvalidIndex if index is out of range.
Sourcepub fn output_f32(&self, index: usize) -> Result<Vec<f32>, Error>
pub fn output_f32(&self, index: usize) -> Result<Vec<f32>, Error>
Dequantized f32 output for the given output index.
Converts each raw INT8 value to f32 using affine dequantization:
value = (raw_i8 - zero_point) * scaleZero-point and scale are read from the tensor’s quantization parameters (set during model conversion).
Note: This allocates a new Vec<f32>. If you need to dequantize
only part of the output (e.g. after NC1HWC2 conversion), use
dequantize_affine directly.