ember-rs
ember-rs is a no_std embedded TinyML inference engine for INT8 models.
It is a fork and redesign of microflow-rs,
with a pluggable backend interface so optimized hardware kernels can be swapped in without
changing model-facing code.
The long-term goal is to be the Burn-style backend abstraction for embedded INT8 inference:
the model graph is generated at compile time, while operator execution is delegated to a
backend that implements ember_infer_core::KernelBackend.
Workspace
This repository currently contains three crates:
| Crate | Purpose |
|---|---|
ember-infer-core |
Core no_std API: KernelBackend, operator parameter structs, errors, and status type. |
ember-infer-ref |
Pure Rust reference backend. Implements all 7 operators (conv2d, depthwise_conv2d, fully_connected, avg_pool, max_pool, softmax, add) with correct INT8 fixed-point quantization arithmetic. Verified against sine.tflite, speech.tflite, and person_detect.tflite. |
ember-infer-macros |
Procedural macro crate that reads .tflite models and generates backend-dispatched inference wrappers. |
The ESP32-S3 backend is intentionally not part of this workspace. It lives in a separate
repository as ember-esp and implements the same KernelBackend trait using Espressif
esp-nn kernels.
Requirements
Use nightly Rust:
The workspace includes rust-toolchain.toml, so normal cargo commands should select
nightly automatically in this directory.
Crates
For an application using the generated model wrapper with the reference backend:
[]
= "0.1.0"
= "0.1.0"
= "0.1.0"
Rust imports use underscores, because Cargo package names with hyphens are exposed as crate names with underscores:
use model;
use RefBackend;
Usage: Model Inference
ember-rs is designed so application code references a TensorFlow Lite model at compile time, chooses a concrete backend, and then runs inference through static dispatch.
The intended high-level flow is:
- Put a quantized INT8
.tflitemodel in your project, for examplemodels/sine.tflite. - Annotate a model struct with
ember-infer-macros. - Create a backend value, such as
RefBackendor an externalEspBackend. - Pass input and output buffers to the generated inference method.
The macro generates input_len(), output_len(), scratch_len::<B>(),
predict_quantized(...), and predict_quantized_with_scratch(...) for the annotated
struct:
use model;
use RefBackend;
;
The important part is that the backend is a normal argument. Switching inference engines does not change the model wrapper:
use EspBackend;
let mut backend = new;
let input = ;
let mut output = ;
// Pick a fixed size appropriate for your model/backend, or derive it from
// `SineModel::scratch_len::<EspBackend>()` during bring-up.
const SCRATCH_LEN: usize = 4096;
let mut scratch = ;
predict_quantized_with_scratch?;
For backends that need scratch memory, query the required length for that backend:
let required = ;
On embedded targets you usually turn that value into a fixed stack/static buffer according to your platform's memory policy.
The generated inference methods are generic over the backend:
input and output must match input_len() and output_len(). If either slice has the
wrong length, inference returns KernelError::InvalidShape.
The current generated API is quantized-only. Feed INT8 input tensors and read INT8 output
tensors. Floating-point convenience helpers can be added above this API by quantizing into
an INT8 input buffer before calling predict_quantized.
Generated Operator Calls
ember-infer-macros turns each supported TFLite operator into a KernelBackend call. In other
words, generated model code is equivalent to this low-level pattern:
use ;
ember-infer-ref provides a complete pure-Rust INT8 reference implementation. Use it for
host-side testing, CI, and as the baseline when bringing up a new hardware backend:
use RefBackend;
let mut backend = RefBackend;
predict_quantized?;
Custom Backends
To add a backend, implement the ember_infer_core::KernelBackend trait for your backend type.
The trait is the only required backend contract.
use ;
;
The required invoke methods are:
| Method | Operator |
|---|---|
conv2d |
CONV_2D |
depthwise_conv2d |
DEPTHWISE_CONV_2D |
fully_connected |
FULLY_CONNECTED |
avg_pool |
AVERAGE_POOL_2D |
max_pool |
MAX_POOL_2D |
softmax |
SOFTMAX |
add |
ADD |
Backends that need temporary memory should also override the scratch-size associated functions:
These functions default to 0, which is appropriate for backends that do not need scratch
memory. Optimized kernels such as esp-nn or CMSIS-NN-style implementations should return
the exact number of bytes needed by the corresponding operator.
Backend Semantics
Parameter structs in ember-infer-core intentionally mirror TFLite Micro naming and layout
semantics. Tensor data is INT8 and operator tensors use the same layouts expected by the
trait documentation:
| Parameter type | Layout |
|---|---|
Conv2dParams input/output |
NHWC |
Conv2dParams weights |
[C_out, KH, KW, C_in] |
DepthwiseConv2dParams input/output |
NHWC |
FullyConnectedParams weights |
[output_depth, input_depth] |
SoftmaxParams input |
[batch, num_classes] |
The trait covers the invoke phase only. Shape inference, tensor allocation, and scratch
array sizing are intended to be handled at compile time by ember-infer-macros.
Development
Useful checks:
Lineage
ember-rs is based on microflow-rs, originally developed by Matteo Carnelos as part of
his master's thesis project at the University of Padova in collaboration with Grepit AB.
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.