Crate rknn_runtime

Expand description

Rust bindings for RKNN NPU inference on Rockchip SoCs.

This crate allows you to load *.rknn model, run it on the NPU, and read back the results. It wraps the C library librknnmrt.so and handles all the low-level details: zero-copy memory allocation, NPU-to-CPU cache sync, and the unusual NC1HWC2 tensor layout.

§Usage

use rknn_runtime::RknnModel;

// Load a model file
let model = RknnModel::load("model.rknn").unwrap();

// Prepare input: raw RGB bytes in NHWC layout, no normalization needed.
// The byte length must match the model's expected input size.

// Run inference on the NPU
model.run(&rgb_data).unwrap();

// Read output as raw INT8 (zero-copy, no allocation)
let raw: &[i8] = model.output_raw(0).unwrap();

// ...or as dequantized f32 (allocates a new Vec)
let floats: Vec<f32> = model.output_f32(0).unwrap();

§NC1HWC2 output layout

RKNN models (especially on RV1106) often output tensors in NC1HWC2 format instead of standard NCHW. Channels are packed into blocks of c2 (typically 16). Use nc1hwc2_to_flat to convert this into a normal flat array before parsing:

let flat = nc1hwc2_to_flat(raw, c1, h, w, c2, total_channels);
let data = dequantize_affine(&flat, output.zp, output.scale);
// data[channel * num_predictions + prediction_index]

§Linking modes

dynamic (default) - loads librknnmrt.so at runtime via libloading. You can compile on x86 without having the RKNN library installed.
static-link - links at compile time. Requires librknnmrt.so to be present on the build machine.

Re-exports§

pub use error::Error;
pub use inference::RknnModel;
pub use tensor::TensorAttr;
pub use tensor::TensorFormat;
pub use tensor::TensorType;
pub use tensor::QuantType;
pub use tensor::dequantize_affine;
pub use tensor::nc1hwc2_to_flat;

Modules§

error: Error types for RKNN operations.
ffi: Raw C FFI types and function signatures (for librknnmrt.so obviously).
inference: High-level inference API.
tensor