Struct RknnModel

Source

pub struct RknnModel { /* private fields */ }

Expand description

A loaded RKNN model ready for inference.

This is the main type you interact with. It holds the model, pre-allocated zero-copy memory buffers for input and outputs, and handles all communication with the NPU.

§Lifecycle

Load a model with load or load_with_lib. This initializes the NPU context and allocates memory buffers.
Inspect tensor metadata via input_attr and output_attrs to learn expected shapes and formats.
Run inference with run - pass raw RGB bytes (NHWC, u8, no normalization).
Read results with output_raw (zero-copy &[i8]) or output_f32 (dequantized Vec<f32>).

§Example

use rknn_runtime::RknnModel;

let model = RknnModel::load("model.rknn")?;

// Check what the model expects
let input = model.input_attr();
// e.g. [1, 320, 320, 3]
println!("Input shape: {:?}", input.shape);

// Run inference
model.run(&rgb_bytes)?;

// Get raw INT8 output (zero-copy - no allocation, just a slice into NPU memory)
let raw = model.output_raw(0)?;

// Or get dequantized f32 output (allocates a new Vec)
let floats = model.output_f32(0)?;

§Drop order

Internally, memory buffers are dropped before the RKNN context. This is handled automatically - you don’t need to worry about it.

Implementations§

Source §

impl RknnModel

Source

pub fn load(model_path: &str) -> Result<Self, Error>

Load a .rknn model from a file.

Uses the default library path (/usr/lib/librknnmrt.so). If your librknnmrt.so is elsewhere, use load_with_lib.

§Errors

Error::IoError if the file cannot be read.
Error::LibraryNotFound if librknnmrt.so is not found.
Error::InitFailed if the NPU rejects the model.

Source

pub fn load_with_lib(model_path: &str, lib_path: &str) -> Result<Self, Error>

Load a .rknn model from a file, using a custom library path.

let model = RknnModel::load_with_lib(
    "model.rknn",
    "/opt/rknn/lib/librknnmrt.so",
)?;

Source

pub fn load_from_bytes(model_data: &[u8], lib_path: &str) -> Result<Self, Error>

Load a model from raw bytes already in memory.

Useful when the .rknn file is embedded in your binary or received over the network.

Source

pub fn input_attr(&self) -> &TensorAttr

Input tensor metadata (shape, format, data type).

The shape is typically [1, H, W, 3] (NHWC). Use this to know what image size the model expects:

let input = model.input_attr();
let (h, w) = (input.shape[1], input.shape[2]);
println!("Model expects {}x{} RGB image", h, w);

Source

pub fn output_attrs(&self) -> &[TensorAttr]

Output tensor metadata for all outputs.

Most models have a single output, but some could have several. Each TensorAttr contains the shape, format, quantization zero-point and scale - everything you need to decode the output.

Source

pub fn run(&self, input: &[u8]) -> Result<(), Error>

Run inference on the NPU.

input must be raw RGB bytes in NHWC format ([1, H, W, 3]). No normalization, no channel reordering - just plain u8 pixel values.

After this returns, read results with output_raw or output_f32.

§What happens inside

Copies input bytes into the pre-allocated NPU input buffer.
Calls rknn_run() - the NPU executes the model.
Calls rknn_mem_sync() on each output buffer (syncs NPU cache to CPU). In my case: this step is critical on RV1106 - without it, I get stale data.

Source

pub fn output_raw(&self, index: usize) -> Result<&[i8], Error>

Raw INT8 output data for the given output index.

Returns a slice pointing directly into the NPU’s zero-copy buffer. No allocation, no copying - this is as fast as it gets.

The data is in whatever layout the NPU uses (often NC1HWC2). Use nc1hwc2_to_flat to convert it to standard NCHW if needed.

§Errors

Returns Error::InvalidIndex if index is out of range.

Source

pub fn output_f32(&self, index: usize) -> Result<Vec<f32>, Error>

Dequantized f32 output for the given output index.

Converts each raw INT8 value to f32 using affine dequantization:

value = (raw_i8 - zero_point) * scale

Zero-point and scale are read from the tensor’s quantization parameters (set during model conversion).

Note: This allocates a new Vec<f32>. If you need to dequantize only part of the output (e.g. after NC1HWC2 conversion), use dequantize_affine directly.