rnn
Quick links: Overview · Architecture · Format · FFI · Compatibility · Production
rnn is a low-level Rust neural-network core built around explicit memory control, binary model formats, and FFI interoperability.
It is designed for native/embedded-style workflows where you want to control:
- how model bytes are created,
- how buffers are allocated,
- how inference is executed,
- and how the same core is reused across Rust and non-Rust runtimes.
Table of Contents
- What this project does
- Why the generated neural network exists
- Schema of the generated neural network
- Real drawing of the generated network (actual sample values)
- Conceptual schema of networks built by the library
- Conceptual schema of model construction
- How this network is created (exact pipeline)
- Binary dense format (
RMD1) details - RMD1 binary layout (concise spec)
- Runtime parser path (
RNN\0) and format split - Core inference execution model
- Complete module reference
- Public API groups (selected)
- Compatibility matrix
- Build and artifacts
- Generate and validate a sample
.rnn - FFI integration lifecycle
- Performance notes
- Security and safety notes
- Versioning and stability policy
- Project validation scripts
- Testing status
- FAQ
- Production checklist
- Contributing
- License
What this project does
This project provides end-to-end building blocks to:
- Define dense network topology and layer specs
- Validate parameter counts and index ranges
- Serialize models into compact binary payloads
- Deserialize/validate payloads safely
- Run deterministic inference with caller-provided scratch buffers
- Expose the same runtime through a C ABI
In addition to dense flow, the crate includes modules for attention, KV cache, RoPE, MoE routing, quantization, sampling, beam search, convolutions, normalization, and profiling/runtime estimation.
Why the generated neural network exists
The sample generator in examples/generate_sample_model.rs exists to provide a deterministic, minimal artifact used for:
- format validation,
- API smoke checks,
- FFI integration checks,
- cross-language consistency checks.
It generates a tiny dense model with:
- topology:
[2, 1] - weights:
[2.0, -1.0] - bias:
[0.5] - activation:
Identity
So the output is:
$$ y = 2.0 \cdot x_0 - 1.0 \cdot x_1 + 0.5 $$
This tiny model is intentionally simple so behavior is easy to verify in every language binding.
Schema of the generated neural network
flowchart LR
X0((x0)) --> N[Dense neuron]
X1((x1)) --> N
B((bias=0.5)) --> N
N --> Y((y))
Parameter mapping for this sample:
w0 = 2.0applied tox0w1 = -1.0applied tox1b = 0.5- activation =
Identity
Real drawing of the generated network (actual sample values)
This is the exact neuron-level network generated by examples/generate_sample_model.rs:
graph LR
x0((Input x0)) -- "w0 = +2.0" --> n1((Neuron n1))
x1((Input x1)) -- "w1 = -1.0" --> n1
b((Bias +0.5)) --> n1
n1 -- "Identity" --> y((Output y))
Operationally, the neuron computes:
$$ z = (2.0 \cdot x_0) + (-1.0 \cdot x_1) + 0.5 $$
Because the output activation is Identity, the final output is:
$$ y = z $$
So for this generated sample model:
$$ y = 2.0 \cdot x_0 - x_1 + 0.5 $$
Conceptual schema of networks built by the library
Beyond the tiny sample model, the core dense path implemented by this crate is conceptually a feed-forward stack of dense layers:
flowchart LR
I[Input vector x] --> L1[Dense Layer 1\nW1 x + b1\nActivation a1]
L1 --> L2[Dense Layer 2\nW2 h1 + b2\nActivation a2]
L2 --> L3[... Optional hidden layers ...]
L3 --> O[Output layer\nWn h(n-1) + bn\nOutput activation]
Each dense layer is represented internally with:
input_sizeoutput_sizeweight_offsetbias_offsetactivation
Those descriptors are chained and validated before execution (LayerPlan::validate).
Conceptual schema of model construction
The dense model creation flow is explicit and deterministic:
flowchart TD
T[Topology\nexample: 2 -> 1 or 8 -> 16 -> 4] --> S[Build dense layer specs\ninput/output sizes + offsets + activations]
P[Weights + Biases] --> S
S --> V[Range/count validation\nweights_len and biases_len checks]
V --> E[Encode binary model\nRMD1 header + layer metadata + tensors]
E --> F[.rnn file payload]
F --> D[Decode + validate at runtime]
D --> R[Run inference with explicit scratch buffers]
Why this design:
- predictable memory behavior (no hidden runtime allocations in core path),
- strict structural checks before compute,
- straightforward interop with FFI consumers.
How this network is created (exact pipeline)
Step 1: Topology and parameters
topology = [2, 1]- user-provided
weights,biases
Step 2: Build layer specs
build_dense_specs_from_layers computes for each layer:
input_size,output_sizeweight_offset,bias_offset- activation choice (hidden vs output)
It also validates consistency with total weights/biases.
Step 3: Encode binary payload
encode_dense_model_v1 writes:
- magic/version/header
- layer metadata
- packed weights
- packed biases
Step 4: Persist bytes
The example writes the result as a .rnn file.
Step 5: Runtime consumption
At inference time:
rnn_required_dense_from_bytes_v1inspects required countsdecode_dense_model_v1reconstructs layer specs/parametersforward_dense_planexecutes with caller scratch buffers
Binary dense format (RMD1) details
Dense format helpers are in src/model_format and src/rnn_api.
Key characteristics:
- Magic:
RMD1 - Versioned header
- Layer metadata contains input/output sizes, offsets, activation id
- All critical ranges are validated before use
- Decode fails on truncation, bad version/magic, invalid offsets, or capacity mismatch
This gives a strict producer/consumer contract for dense models.
RMD1 binary layout (concise spec)
Dense RMD1 payload layout used by model_format:
- Header (20 bytes total):
magic(4 bytes):RMD1version(u16)flags(u16, currently reserved)layer_count(u32)weights_len(u32)biases_len(u32)
- Layer metadata array (
layer_countentries, 20 bytes each):input_size(u32)output_size(u32)weight_offset(u32)bias_offset(u32)activation(u8)reserved(3 bytes)
- Weights payload (
weights_len * 4bytes, f32 little-endian) - Biases payload (
biases_len * 4bytes, f32 little-endian)
Validation guarantees include:
- non-zero dimensions,
- checked offset arithmetic,
- bounds checks against tensor payload lengths,
- truncation/version/magic checks at decode.
Runtime parser path (RNN\0) and format split
The repository also contains parser utilities in src/rnn_format with RNN\0 magic.
So there are two format domains in the project:
- Dense model serialization path (
RMD1) - Runtime blob parser path (
RNN\0)
This is intentional in code, but requires clear pipeline discipline in production.
Core inference execution model
Dense execution path is explicit and buffer-oriented:
- Validate plan and shape chain
- Compute scratch requirement from max width and batch size
- Use two alternating scratch lanes for layer-by-layer forward pass
- Copy final lane into output buffer
This avoids hidden execution state and keeps runtime behavior predictable.
Complete module reference
Core execution
network: network-level checks and statslayers: layer descriptors, chaining/range validation, topology→spec conversionengine: dense forward kernels, scratch sizing, shape checksinference: batch forward wrappers, stable softmax and logits helpersruntime: memory/flops/throughput/budget estimatorsmodel_config: predefined config helpers
Tensor and numerics
tensor: tensor views, indexing, layout checksscratch: temporary memory helpersactivations: activation kinds and vector applicationnormalization: layer norm / RMS normquantization: i8/f32 quant/dequant and mixed matmulmath(in src/lib.rs): no-std-friendly approximations
Training-adjacent
losses: loss and reduction logicmetrics: MSE/MAE/accuracy/argmax and running meansgradients: norm, clipping, finite checksoptimizers: optimizer update pathsschedulers: LR schedulingtrainer: SGD-oriented step helpersinitializers: parameter count/init helpers
Transformer-style blocks
attention: scaled dot-product attention + masks/shapeskv_cache: KV cache views/errorsrope: rotary position embedding applicationsampling: temperature/top-k/top-p sampling primitivesbeam_search: beam selection utilitiesmoe: top-1 gating and routingembeddings: embedding gather and tied projectionlora: LoRA delta application
Spatial/specialized operators
conv3d: 3D convolution and compatibility checksconv5d: 5D convolution forward/backwardsphere5d: 5D sphere structures/helpersbatching: padding and mask generation
Formats and interop
model_format: dense model encoding/decoding (RMD1)rnn_api: high-level dense lifecycle APIsrnn_format: runtime blob parser (RNN\0)ffi_api: C ABI implementationpublic_api: re-exported public surfacecrypto: hashing/integrity helpersprofiler: operation counting helpers
Legacy note
embedingsexists as a legacy spelling path in repository history/structure.
Public API groups (selected)
The crate re-exports many symbols through src/public_api.rs.
Examples by category:
- Dense lifecycle:
rnn_required_dense_from_bytes_v1,rnn_pack_dense_v1,rnn_run_dense_v1 - Format:
encode_dense_model_v1,decode_dense_model_v1,encoded_size_v1 - Inference ops:
forward_dense_batch,scaled_dot_product_attention,apply_rope_in_place - Optimization:
dense_sgd_step,apply_optimizer_step,clip_by_global_norm - Runtime estimates:
estimate_runtime_memory,estimate_runtime_flops,check_runtime_budget - FFI C API: model create/run/destroy + ABI checks in include/rnn_ffi.h
Compatibility matrix
This project is designed to be compatible across all major desktop/server OSes:
| Platform | Rust crate build | FFI artifacts | Notes |
|---|---|---|---|
| Linux | Supported | Supported (.so, .a) |
Primary native flow |
| macOS | Supported | Supported (.dylib, .a) |
Standard clang/ld toolchain |
| Windows | Supported | Supported (.dll, .lib) |
MSVC/MinGW depending on toolchain |
General requirements:
- Rust stable toolchain
- C/C++ toolchain when consuming FFI outputs
- Platform-specific linker/runtime setup for shared libraries
Build and artifacts
Build:
With current crate config, release builds can emit Rust + native artifacts according to platform/toolchain (rlib, cdylib, staticlib).
Generate and validate a sample .rnn
Generate:
Sanity-check:
Expected dense header bytes correspond to RMD1.
FFI integration lifecycle
C header: include/rnn_ffi.h
Recommended host flow:
rnn_ffi_api_version/rnn_ffi_is_abi_compatiblernn_ffi_model_create_from_bytes_v1rnn_ffi_model_get_infornn_ffi_model_run_denseorrnn_ffi_model_run_dense_batchrnn_ffi_model_destroy
Performance notes
- Dense forward cost is dominated by matrix-vector products per layer.
- For dense stacks, per-sample compute is approximately proportional to:
$$ \sum_{l=1}^{L} (\text{in}_l \times \text{out}_l) $$
- Batch mode reuses the same plan and alternates scratch lanes for better locality.
- Scratch requirements scale with
batch_size * max_layer_width * 2in the current engine path. - Quantization and runtime estimation modules can be used to pre-plan deployment budgets.
Security and safety notes
- Never trust external model bytes by default.
- Always validate incoming payloads before inference (
required_*and decode checks). - Keep ABI checks enabled in cross-language hosts (
rnn_ffi_is_abi_compatible). - Treat model files as untrusted input in service contexts (sandbox, size limits, resource guards).
- Keep
check_abi_contract.shin CI if you publish FFI artifacts.
Versioning and stability policy
Visualization usage
This crate exposes allocation-free mesh helpers (slice-based) for wrapper authors and visualization tooling.
- Vertex layout: 8 floats per vertex in this order: x, y, z, nx, ny, nz, u, v.
- Index buffer type:
u32(tri/quad indices).
API notes (examples show the common functions available in the visualization module):
mesh_required_buffers_from_bytes(bytes: &[u8], quad_neurons: bool, threshold: f32) -> Result<(usize, usize), Error>- returns
(vertex_count, index_count)required for the mesh.
- returns
fill_mesh_from_bytes(bytes: &[u8], quad_neurons: bool, threshold: f32, quad_size: f32, vertex_buf: &mut [f32], index_buf: &mut [u32]) -> Result<(usize, usize), Error>- fills
vertex_bufandindex_bufwith vertex floats and indices; returns counts actually written.
- fills
Example: using IA_for_NNN (high-level integration crate)
use model_bytes;
use ;
let bytes = model_bytes;
let = mesh_required_buffers_from_bytes.unwrap;
let mut vertices = vec!;
let mut indices = vec!;
let = fill_mesh_from_bytes.unwrap;
Example: using Native_Neural_Network_std (the std wrapper)
use DenseModel;
use ;
let model = from_file;
let handle = model.handle;
let = mesh_required_buffers_from_rnn.unwrap;
let mut vertices = vec!;
let mut indices = vec!;
let = fill_mesh_from_rnn.unwrap;
Notes:
-
Allocate
vertex_bufasvertex_count * 8f32entries. -
Allocate
index_bufasindex_countu32entries. -
The functions are no-heap and expect caller-provided slices; they will not reallocate.
-
Rust crate API should follow semantic versioning for public surface changes.
-
C ABI changes should be treated as compatibility-sensitive and version-gated.
-
Model format changes (
RMD1) should be versioned explicitly and decoded defensively. -
Breaking changes should be documented in release notes and migration guidance.
Project validation scripts
- scripts/check_abi_contract.sh: validates expected ABI symbols
- scripts/prod_ready_check.sh: broad production-style checks
Note: prod_ready_check.sh references optional wrapper ecosystems (wrappers/python, wrappers/javascript, wrappers/java, wrappers/cpp) and related tooling.
Subtleties and design constraints
These are important, non-obvious project subtleties:
-
no_stdcore behavior The crate is intentionally low-level and optimized for explicit runtime control. -
Dual format domain (
RMD1andRNN\0) Dense serialization and runtime blob parsing are separate concerns and must be selected deliberately per pipeline. -
Explicit scratch management Inference APIs rely on caller-allocated buffers. This is by design for deterministic memory behavior.
-
Strict range validation Layer offsets, dimensions, and capacities are validated before execution to prevent unsafe indexing paths.
-
FFI ABI contract stability matters Any C ABI change must stay synchronized between src/ffi_api and include/rnn_ffi.h.
-
Repository currently includes broad domain modules The crate is not a tiny single-purpose dense runner; it is a wide NN systems toolbox.
Project validation scripts
- scripts/check_abi_contract.sh: validates expected ABI symbols
- scripts/prod_ready_check.sh: broad production-style checks
Note: prod_ready_check.sh references optional wrapper ecosystems (wrappers/python, wrappers/javascript, wrappers/java, wrappers/cpp) and related tooling.
Testing status
As requested for this repository:
- no in-repo unit-test focus is currently documented here,
- a dedicated
stdwrapper crate is planned, - all unit tests are intended to be centralized in that wrapper (
Native_Neural_Network_std).
FAQ
Why no_std?
To keep the core deterministic and portable for constrained/native runtimes.
Why both RMD1 and RNN\0 paths?
They represent two format domains in the repository (dense serialization vs runtime parser utilities). Keep pipeline usage explicit.
Why a separate std wrapper for unit tests?
To keep this core focused on runtime/format/FFI behavior while enabling richer testing ergonomics in a host-friendly crate.
Can I use this on Windows/Linux/macOS?
Yes. The crate and FFI flow are designed for all three platforms with standard Rust + native toolchains.
Production checklist
- Build release artifacts (
cargo build --release) - Validate ABI contract (
scripts/check_abi_contract.sh) - Generate and verify sample model (
examples/generate_sample_model.rs) - Verify FFI lifecycle in your host runtime (create/run/destroy)
- Apply resource limits and input validation for model loading
- Track runtime budgets (memory/FLOPs/throughput) before deployment
Contributing
Contributions are welcome.
Suggested local checks:
For major changes, open an issue first with:
- scope,
- impacted modules,
- compatibility expectations.
License
MIT.
See LICENSE.