ARA-2 Client Library
Rust client library for the Kinara ARA-2 neural network accelerator. Provides session management, model loading, and inference on NXP i.MX platforms equipped with ARA-2 PCIe hardware.
Supported Platforms
| Platform | SoC | Status |
|---|---|---|
| NXP FRDM i.MX 8M Plus | i.MX 8M Plus | Tested |
| NXP FRDM i.MX 95 | i.MX 95 | Tested |
Requires EdgeFirst Yocto Images with ARA-2 SDK support.
Workspace
| Crate | Description |
|---|---|
ara2 |
Core client library — session, endpoint, model, and DVM metadata APIs |
ara2-sys |
FFI bindings to libaraclient.so via libloading |
Integration with edgefirst-hal
The ara2 crate integrates with edgefirst-hal
(enabled by default via the hal feature) for:
- Tensor memory management — DMA-backed tensors for zero-copy NPU transfers
- Image preprocessing — Hardware-accelerated format conversion and scaling
- Post-processing — YOLO decoding, overlay rendering, segmentation masks
Disable the hal feature for a minimal FFI-only build:
Python Bindings
Python bindings are available as a separate package via PyPI:
See crates/ara2-py/README.md for the Python API reference.
Quick Start
use ;
use ;
// Connect to the ARA-2 proxy service
let session = create_via_unix_socket?;
// Enumerate NPU endpoints and check status
let endpoints = session.list_endpoints?;
let endpoint = &endpoints;
println!;
// Load a compiled model (.dvm) and allocate DMA tensors
let mut model = endpoint.load_model_from_file?;
model.allocate_tensors?;
// Run inference
let timing = model.run?;
println!;
# Ok::
Runtime Requirements
The following must be present on the target system:
libaraclient.so.1— Kinara client library (from the ARA-2 SDK)ara2-proxy— System service providing NPU access, must be running- ARA-2 hardware — PCIe accelerator card visible via
lspci
Building
Native
Cross-compile for aarch64 (NXP i.MX)
Performance
Benchmarked on NXP i.MX 8M Plus + ARA-2 with YOLOv8n (640x640), showing the Python API adds minimal overhead over native Rust thanks to DMA-BUF zero-copy tensor sharing — the GPU and NPU operate on the same physical buffers with no CPU copies in the data path.
| Stage | Rust | Python | Overhead |
|---|---|---|---|
| GPU preprocess (RGBA → CHW) | 6.35 ms | 6.37 ms | +0.02 ms |
| NPU inference (wall clock) | 8.95 ms | 9.13 ms | +0.18 ms |
| NPU execution | 3.33 ms | 3.33 ms | — |
| DMA input upload | 2.21 ms | 2.20 ms | — |
| DMA output download | 1.96 ms | 1.96 ms | — |
| Postprocess (decode + NMS) | 1.41 ms | 2.53 ms | +1.12 ms |
| Total pipeline | 16.71 ms | 18.03 ms | +1.32 ms |
| Throughput | 59.9 FPS | 55.5 FPS |
Steady-state mean over 20 iterations after warmup. The Python overhead is entirely in postprocessing (numpy array marshalling); GPU preprocessing and NPU inference are identical since both use the same DMA-BUF tensors.
Examples
| Example | Description |
|---|---|
yolov8.rs |
Rust — YOLOv8 detection/segmentation with HAL pre/post-processing |
yolov8.py |
Python — YOLOv8 detection with DMA-BUF pipeline and HAL decoder |
endpoints.py |
Python — Connect, list endpoints, check status |
test_dvm_metadata.rs |
Rust — Read and display DVM model metadata |
Run examples:
# Rust
# Python
Testing
Tests require an NXP i.MX + ARA-2 system with the proxy running:
# All tests (on-target with hardware)
# Metadata tests only (no hardware needed)
# Model tests (needs a .dvm file)
ARA2_TEST_MODEL=/path/to/model.dvm
Documentation
- ARCHITECTURE.md — System architecture and ownership model
- CONTRIBUTING.md — Contribution guidelines
- SECURITY.md — Security policy
- CHANGELOG.md — Release history
License
Licensed under the Apache License 2.0. See LICENSE for details.
Copyright 2025 Au-Zone Technologies. All Rights Reserved.