ember-infer-ref 0.1.0

Pure Rust reference backend for ember-rs embedded INT8 inference engine
Documentation
  • Coverage
  • 50%
    1 out of 2 items documented0 out of 1 items with examples
  • Size
  • Source code size: 44.57 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 234.61 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 32s Average build duration of successful builds.
  • all releases: 24s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • vintcessun/ember-rs
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • vintcessun

ember-rs

CI License

ember-rs is a no_std embedded TinyML inference engine for INT8 models. It is a fork and redesign of microflow-rs, with a pluggable backend interface so optimized hardware kernels can be swapped in without changing model-facing code.

The long-term goal is to be the Burn-style backend abstraction for embedded INT8 inference: the model graph is generated at compile time, while operator execution is delegated to a backend that implements ember_infer_core::KernelBackend.

Workspace

This repository currently contains three crates:

Crate Purpose
ember-infer-core Core no_std API: KernelBackend, operator parameter structs, errors, and status type.
ember-infer-ref Pure Rust reference backend. Implements all 7 operators (conv2d, depthwise_conv2d, fully_connected, avg_pool, max_pool, softmax, add) with correct INT8 fixed-point quantization arithmetic. Verified against sine.tflite, speech.tflite, and person_detect.tflite.
ember-infer-macros Procedural macro crate that reads .tflite models and generates backend-dispatched inference wrappers.

The ESP32-S3 backend is intentionally not part of this workspace. It lives in a separate repository as ember-esp and implements the same KernelBackend trait using Espressif esp-nn kernels.

Requirements

Use nightly Rust:

cargo +nightly check --workspace

The workspace includes rust-toolchain.toml, so normal cargo commands should select nightly automatically in this directory.

Usage: Model Inference

ember-rs is designed so application code references a TensorFlow Lite model at compile time, chooses a concrete backend, and then runs inference through static dispatch.

The intended high-level flow is:

  1. Put a quantized INT8 .tflite model in your project, for example models/sine.tflite.
  2. Annotate a model struct with ember-infer-macros.
  3. Create a backend value, such as RefBackend or an external EspBackend.
  4. Pass input and output buffers to the generated inference method.

The macro generates input_len(), output_len(), scratch_len::<B>(), predict_quantized(...), and predict_quantized_with_scratch(...) for the annotated struct:

use ember_infer_macros::model;
use ember_infer_ref::RefBackend;

#[model("models/sine.tflite")]
pub struct SineModel;

fn main() -> Result<(), ember_infer_core::KernelError> {
    let mut backend = RefBackend;

    let input = [0i8; SineModel::input_len()];
    let mut output = [0i8; SineModel::output_len()];

    SineModel::predict_quantized(&mut backend, &input, &mut output)?;

    Ok(())
}

The important part is that the backend is a normal argument. Switching inference engines does not change the model wrapper:

use ember_esp::EspBackend;

let mut backend = EspBackend::new();
let input = [0i8; SineModel::input_len()];
let mut output = [0i8; SineModel::output_len()];

// Pick a fixed size appropriate for your model/backend, or derive it from
// `SineModel::scratch_len::<EspBackend>()` during bring-up.
const SCRATCH_LEN: usize = 4096;
let mut scratch = [0u8; SCRATCH_LEN];

SineModel::predict_quantized_with_scratch(&mut backend, &input, &mut output, &mut scratch)?;

For backends that need scratch memory, query the required length for that backend:

let required = SineModel::scratch_len::<EspBackend>();

On embedded targets you usually turn that value into a fixed stack/static buffer according to your platform's memory policy.

The generated inference methods are generic over the backend:

pub fn predict_quantized<B: ember_infer_core::KernelBackend>(
    backend: &mut B,
    input: &[i8],
    output: &mut [i8],
) -> ember_infer_core::Status

pub fn predict_quantized_with_scratch<B: ember_infer_core::KernelBackend>(
    backend: &mut B,
    input: &[i8],
    output: &mut [i8],
    scratch: &mut [u8],
) -> ember_infer_core::Status

input and output must match input_len() and output_len(). If either slice has the wrong length, inference returns KernelError::InvalidShape.

The current generated API is quantized-only. Feed INT8 input tensors and read INT8 output tensors. Floating-point convenience helpers can be added above this API by quantizing into an INT8 input buffer before calling predict_quantized.

Generated Operator Calls

ember-infer-macros turns each supported TFLite operator into a KernelBackend call. In other words, generated model code is equivalent to this low-level pattern:

use ember_infer_core::{
    Conv2dParams, ElementwiseAddParams, FullyConnectedParams, KernelBackend, PoolParams,
    SoftmaxParams, Status,
};

fn run_model<B: KernelBackend>(backend: &mut B) -> Status {
    // The macro emits concrete parameter structs using model metadata,
    // embedded weights, input/output slices, and intermediate buffers.
    // backend.conv2d(Conv2dParams { ... })?;
    // backend.fully_connected(FullyConnectedParams { ... })?;
    // backend.softmax(SoftmaxParams { ... })?;

    let _ = (
        core::mem::size_of::<Conv2dParams<'_>>(),
        core::mem::size_of::<FullyConnectedParams<'_>>(),
        core::mem::size_of::<PoolParams<'_>>(),
        core::mem::size_of::<SoftmaxParams<'_>>(),
        core::mem::size_of::<ElementwiseAddParams<'_>>(),
    );

    Ok(())
}

ember-infer-ref provides a complete pure-Rust INT8 reference implementation. Use it for host-side testing, CI, and as the baseline when bringing up a new hardware backend:

use ember_infer_ref::RefBackend;

let mut backend = RefBackend;
SineModel::predict_quantized(&mut backend, &input, &mut output)?;

Custom Backends

To add a backend, implement the ember_infer_core::KernelBackend trait for your backend type. The trait is the only required backend contract.

use ember_infer_core::{
    Conv2dParams, DepthwiseConv2dParams, ElementwiseAddParams, FullyConnectedParams,
    KernelBackend, KernelError, PoolParams, SoftmaxParams, Status,
};

pub struct MyBackend;

impl KernelBackend for MyBackend {
    fn conv2d(&mut self, params: Conv2dParams<'_>) -> Status {
        let _ = params;
        Err(KernelError::InternalError)
    }

    fn depthwise_conv2d(&mut self, params: DepthwiseConv2dParams<'_>) -> Status {
        let _ = params;
        Err(KernelError::InternalError)
    }

    fn fully_connected(&mut self, params: FullyConnectedParams<'_>) -> Status {
        let _ = params;
        Err(KernelError::InternalError)
    }

    fn avg_pool(&mut self, params: PoolParams<'_>) -> Status {
        let _ = params;
        Err(KernelError::InternalError)
    }

    fn max_pool(&mut self, params: PoolParams<'_>) -> Status {
        let _ = params;
        Err(KernelError::InternalError)
    }

    fn softmax(&mut self, params: SoftmaxParams<'_>) -> Status {
        let _ = params;
        Err(KernelError::InternalError)
    }

    fn add(&mut self, params: ElementwiseAddParams<'_>) -> Status {
        let _ = params;
        Err(KernelError::InternalError)
    }
}

The required invoke methods are:

Method Operator
conv2d CONV_2D
depthwise_conv2d DEPTHWISE_CONV_2D
fully_connected FULLY_CONNECTED
avg_pool AVERAGE_POOL_2D
max_pool MAX_POOL_2D
softmax SOFTMAX
add ADD

Backends that need temporary memory should also override the scratch-size associated functions:

impl KernelBackend for MyBackend {
    // required invoke methods omitted

    fn conv2d_scratch_size(
        input_shape: [usize; 4],
        weights_shape: [usize; 4],
        output_shape: [usize; 4],
    ) -> usize {
        let _ = (input_shape, weights_shape, output_shape);
        0
    }

    fn depthwise_conv2d_scratch_size(
        input_shape: [usize; 4],
        weights_shape: [usize; 4],
        output_shape: [usize; 4],
    ) -> usize {
        let _ = (input_shape, weights_shape, output_shape);
        0
    }

    fn softmax_scratch_size(num_classes: usize) -> usize {
        let _ = num_classes;
        0
    }
}

These functions default to 0, which is appropriate for backends that do not need scratch memory. Optimized kernels such as esp-nn or CMSIS-NN-style implementations should return the exact number of bytes needed by the corresponding operator.

Backend Semantics

Parameter structs in ember-infer-core intentionally mirror TFLite Micro naming and layout semantics. Tensor data is INT8 and operator tensors use the same layouts expected by the trait documentation:

Parameter type Layout
Conv2dParams input/output NHWC
Conv2dParams weights [C_out, KH, KW, C_in]
DepthwiseConv2dParams input/output NHWC
FullyConnectedParams weights [output_depth, input_depth]
SoftmaxParams input [batch, num_classes]

The trait covers the invoke phase only. Shape inference, tensor allocation, and scratch array sizing are intended to be handled at compile time by ember-infer-macros.

Development

Useful checks:

cargo +nightly fmt --all
cargo +nightly check --workspace
cargo +nightly clippy --workspace -- -D warnings
cargo +nightly doc --workspace --no-deps

Lineage

ember-rs is based on microflow-rs, originally developed by Matteo Carnelos as part of his master's thesis project at the University of Padova in collaboration with Grepit AB.

License

Licensed under either of:

at your option.