# rnn



Quick links: [Overview](#what-this-project-does) · [Architecture](#complete-module-reference) · [Format](#binary-dense-format-rmd1-details) · [FFI](#ffi-integration-lifecycle) · [Compatibility](#compatibility-matrix) · [Production](#production-checklist)
`rnn` is a low-level Rust neural-network core built around explicit memory control, binary model formats, and FFI interoperability.
It is designed for native/embedded-style workflows where you want to control:
- how model bytes are created,
- how buffers are allocated,
- how inference is executed,
- and how the same core is reused across Rust and non-Rust runtimes.
## Table of Contents
- [What this project does](#what-this-project-does)
- [Why the generated neural network exists](#why-the-generated-neural-network-exists)
- [Schema of the generated neural network](#schema-of-the-generated-neural-network)
- [Real drawing of the generated network (actual sample values)](#real-drawing-of-the-generated-network-actual-sample-values)
- [Conceptual schema of networks built by the library](#conceptual-schema-of-networks-built-by-the-library)
- [Conceptual schema of model construction](#conceptual-schema-of-model-construction)
- [How this network is created (exact pipeline)](#how-this-network-is-created-exact-pipeline)
- [Binary dense format (`RMD1`) details](#binary-dense-format-rmd1-details)
- [RMD1 binary layout (concise spec)](#rmd1-binary-layout-concise-spec)
- [Runtime parser path (`RNN\0`) and format split](#runtime-parser-path-rnn0-and-format-split)
- [Core inference execution model](#core-inference-execution-model)
- [Complete module reference](#complete-module-reference)
- [Public API groups (selected)](#public-api-groups-selected)
- [Compatibility matrix](#compatibility-matrix)
- [Build and artifacts](#build-and-artifacts)
- [Generate and validate a sample `.rnn`](#generate-and-validate-a-sample-rnn)
- [FFI integration lifecycle](#ffi-integration-lifecycle)
- [Performance notes](#performance-notes)
- [Security and safety notes](#security-and-safety-notes)
- [Versioning and stability policy](#versioning-and-stability-policy)
- [Project validation scripts](#project-validation-scripts)
- [Testing status](#testing-status)
- [FAQ](#faq)
- [Production checklist](#production-checklist)
- [Contributing](#contributing)
- [License](#license)
## What this project does
This project provides end-to-end building blocks to:
1. Define dense network topology and layer specs
2. Validate parameter counts and index ranges
3. Serialize models into compact binary payloads
4. Deserialize/validate payloads safely
5. Run deterministic inference with caller-provided scratch buffers
6. Expose the same runtime through a C ABI
In addition to dense flow, the crate includes modules for attention, KV cache, RoPE, MoE routing, quantization, sampling, beam search, convolutions, normalization, and profiling/runtime estimation.
## Why the generated neural network exists
The sample generator in [examples/generate_sample_model.rs](examples/generate_sample_model.rs) exists to provide a deterministic, minimal artifact used for:
- format validation,
- API smoke checks,
- FFI integration checks,
- cross-language consistency checks.
It generates a tiny dense model with:
- topology: `[2, 1]`
- weights: `[2.0, -1.0]`
- bias: `[0.5]`
- activation: `Identity`
So the output is:
$$
y = 2.0 \cdot x_0 - 1.0 \cdot x_1 + 0.5
$$
This tiny model is intentionally simple so behavior is easy to verify in every language binding.
## Schema of the generated neural network
```mermaid
flowchart LR
X0((x0)) --> N[Dense neuron]
X1((x1)) --> N
B((bias=0.5)) --> N
N --> Y((y))
```
Parameter mapping for this sample:
- `w0 = 2.0` applied to `x0`
- `w1 = -1.0` applied to `x1`
- `b = 0.5`
- activation = `Identity`
## Real drawing of the generated network (actual sample values)
This is the exact neuron-level network generated by [examples/generate_sample_model.rs](examples/generate_sample_model.rs):
```mermaid
graph LR
x0((Input x0)) -- "w0 = +2.0" --> n1((Neuron n1))
x1((Input x1)) -- "w1 = -1.0" --> n1
b((Bias +0.5)) --> n1
n1 -- "Identity" --> y((Output y))
```
Operationally, the neuron computes:
$$
z = (2.0 \cdot x_0) + (-1.0 \cdot x_1) + 0.5
$$
Because the output activation is `Identity`, the final output is:
$$
y = z
$$
So for this generated sample model:
$$
y = 2.0 \cdot x_0 - x_1 + 0.5
$$
## Conceptual schema of networks built by the library
Beyond the tiny sample model, the core dense path implemented by this crate is conceptually a feed-forward stack of dense layers:
```mermaid
flowchart LR
I[Input vector x] --> L1[Dense Layer 1\nW1 x + b1\nActivation a1]
L1 --> L2[Dense Layer 2\nW2 h1 + b2\nActivation a2]
L2 --> L3[... Optional hidden layers ...]
L3 --> O[Output layer\nWn h(n-1) + bn\nOutput activation]
```
Each dense layer is represented internally with:
- `input_size`
- `output_size`
- `weight_offset`
- `bias_offset`
- `activation`
Those descriptors are chained and validated before execution (`LayerPlan::validate`).
## Conceptual schema of model construction
The dense model creation flow is explicit and deterministic:
```mermaid
flowchart TD
T[Topology\nexample: 2 -> 1 or 8 -> 16 -> 4] --> S[Build dense layer specs\ninput/output sizes + offsets + activations]
P[Weights + Biases] --> S
S --> V[Range/count validation\nweights_len and biases_len checks]
V --> E[Encode binary model\nRMD1 header + layer metadata + tensors]
E --> F[.rnn file payload]
F --> D[Decode + validate at runtime]
D --> R[Run inference with explicit scratch buffers]
```
Why this design:
- predictable memory behavior (no hidden runtime allocations in core path),
- strict structural checks before compute,
- straightforward interop with FFI consumers.
## How this network is created (exact pipeline)
### Step 1: Topology and parameters
- `topology = [2, 1]`
- user-provided `weights`, `biases`
### Step 2: Build layer specs
`build_dense_specs_from_layers` computes for each layer:
- `input_size`, `output_size`
- `weight_offset`, `bias_offset`
- activation choice (hidden vs output)
It also validates consistency with total weights/biases.
### Step 3: Encode binary payload
`encode_dense_model_v1` writes:
- magic/version/header
- layer metadata
- packed weights
- packed biases
### Step 4: Persist bytes
The example writes the result as a `.rnn` file.
### Step 5: Runtime consumption
At inference time:
- `rnn_required_dense_from_bytes_v1` inspects required counts
- `decode_dense_model_v1` reconstructs layer specs/parameters
- `forward_dense_plan` executes with caller scratch buffers
## Binary dense format (`RMD1`) details
Dense format helpers are in [src/model_format](src/model_format) and [src/rnn_api](src/rnn_api).
Key characteristics:
- Magic: `RMD1`
- Versioned header
- Layer metadata contains input/output sizes, offsets, activation id
- All critical ranges are validated before use
- Decode fails on truncation, bad version/magic, invalid offsets, or capacity mismatch
This gives a strict producer/consumer contract for dense models.
## RMD1 binary layout (concise spec)
Dense `RMD1` payload layout used by `model_format`:
- Header (20 bytes total):
- `magic` (4 bytes): `RMD1`
- `version` (u16)
- `flags` (u16, currently reserved)
- `layer_count` (u32)
- `weights_len` (u32)
- `biases_len` (u32)
- Layer metadata array (`layer_count` entries, 20 bytes each):
- `input_size` (u32)
- `output_size` (u32)
- `weight_offset` (u32)
- `bias_offset` (u32)
- `activation` (u8)
- `reserved` (3 bytes)
- Weights payload (`weights_len * 4` bytes, f32 little-endian)
- Biases payload (`biases_len * 4` bytes, f32 little-endian)
Validation guarantees include:
- non-zero dimensions,
- checked offset arithmetic,
- bounds checks against tensor payload lengths,
- truncation/version/magic checks at decode.
## Runtime parser path (`RNN\0`) and format split
The repository also contains parser utilities in [src/rnn_format](src/rnn_format) with `RNN\0` magic.
So there are two format domains in the project:
- Dense model serialization path (`RMD1`)
- Runtime blob parser path (`RNN\0`)
This is intentional in code, but requires clear pipeline discipline in production.
## Core inference execution model
Dense execution path is explicit and buffer-oriented:
- Validate plan and shape chain
- Compute scratch requirement from max width and batch size
- Use two alternating scratch lanes for layer-by-layer forward pass
- Copy final lane into output buffer
This avoids hidden execution state and keeps runtime behavior predictable.
## Complete module reference
### Core execution
- `network`: network-level checks and stats
- `layers`: layer descriptors, chaining/range validation, topology→spec conversion
- `engine`: dense forward kernels, scratch sizing, shape checks
- `inference`: batch forward wrappers, stable softmax and logits helpers
- `runtime`: memory/flops/throughput/budget estimators
- `model_config`: predefined config helpers
### Tensor and numerics
- `tensor`: tensor views, indexing, layout checks
- `scratch`: temporary memory helpers
- `activations`: activation kinds and vector application
- `normalization`: layer norm / RMS norm
- `quantization`: i8/f32 quant/dequant and mixed matmul
- `math` (in [src/lib.rs](src/lib.rs)): no-std-friendly approximations
### Training-adjacent
- `losses`: loss and reduction logic
- `metrics`: MSE/MAE/accuracy/argmax and running means
- `gradients`: norm, clipping, finite checks
- `optimizers`: optimizer update paths
- `schedulers`: LR scheduling
- `trainer`: SGD-oriented step helpers
- `initializers`: parameter count/init helpers
### Transformer-style blocks
- `attention`: scaled dot-product attention + masks/shapes
- `kv_cache`: KV cache views/errors
- `rope`: rotary position embedding application
- `sampling`: temperature/top-k/top-p sampling primitives
- `beam_search`: beam selection utilities
- `moe`: top-1 gating and routing
- `embeddings`: embedding gather and tied projection
- `lora`: LoRA delta application
### Spatial/specialized operators
- `conv3d`: 3D convolution and compatibility checks
- `conv5d`: 5D convolution forward/backward
- `sphere5d`: 5D sphere structures/helpers
- `batching`: padding and mask generation
### Formats and interop
- `model_format`: dense model encoding/decoding (`RMD1`)
- `rnn_api`: high-level dense lifecycle APIs
- `rnn_format`: runtime blob parser (`RNN\0`)
- `ffi_api`: C ABI implementation
- `public_api`: re-exported public surface
- `crypto`: hashing/integrity helpers
- `profiler`: operation counting helpers
### Legacy note
- `embedings` exists as a legacy spelling path in repository history/structure.
## Public API groups (selected)
The crate re-exports many symbols through [src/public_api.rs](src/public_api.rs).
Examples by category:
- Dense lifecycle: `rnn_required_dense_from_bytes_v1`, `rnn_pack_dense_v1`, `rnn_run_dense_v1`
- Format: `encode_dense_model_v1`, `decode_dense_model_v1`, `encoded_size_v1`
- Inference ops: `forward_dense_batch`, `scaled_dot_product_attention`, `apply_rope_in_place`
- Optimization: `dense_sgd_step`, `apply_optimizer_step`, `clip_by_global_norm`
- Runtime estimates: `estimate_runtime_memory`, `estimate_runtime_flops`, `check_runtime_budget`
- FFI C API: model create/run/destroy + ABI checks in [include/rnn_ffi.h](include/rnn_ffi.h)
## Compatibility matrix
This project is designed to be compatible across all major desktop/server OSes:
| Linux | Supported | Supported (`.so`, `.a`) | Primary native flow |
| macOS | Supported | Supported (`.dylib`, `.a`) | Standard clang/ld toolchain |
| Windows | Supported | Supported (`.dll`, `.lib`) | MSVC/MinGW depending on toolchain |
General requirements:
- Rust stable toolchain
- C/C++ toolchain when consuming FFI outputs
- Platform-specific linker/runtime setup for shared libraries
## Build and artifacts
Build:
```bash
cargo build
cargo build --release
```
With current crate config, release builds can emit Rust + native artifacts according to platform/toolchain (`rlib`, `cdylib`, `staticlib`).
## Generate and validate a sample `.rnn`
Generate:
```bash
cargo run --example generate_sample_model -- /tmp/sample.rnn
```
Sanity-check:
```bash
ls -lh /tmp/sample.rnn
xxd -l 4 /tmp/sample.rnn
```
Expected dense header bytes correspond to `RMD1`.
## FFI integration lifecycle
C header: [include/rnn_ffi.h](include/rnn_ffi.h)
Recommended host flow:
1. `rnn_ffi_api_version` / `rnn_ffi_is_abi_compatible`
2. `rnn_ffi_model_create_from_bytes_v1`
3. `rnn_ffi_model_get_info`
4. `rnn_ffi_model_run_dense` or `rnn_ffi_model_run_dense_batch`
5. `rnn_ffi_model_destroy`
## Performance notes
- Dense forward cost is dominated by matrix-vector products per layer.
- For dense stacks, per-sample compute is approximately proportional to:
$$
\sum_{l=1}^{L} (\text{in}_l \times \text{out}_l)
$$
- Batch mode reuses the same plan and alternates scratch lanes for better locality.
- Scratch requirements scale with `batch_size * max_layer_width * 2` in the current engine path.
- Quantization and runtime estimation modules can be used to pre-plan deployment budgets.
## Security and safety notes
- Never trust external model bytes by default.
- Always validate incoming payloads before inference (`required_*` and decode checks).
- Keep ABI checks enabled in cross-language hosts (`rnn_ffi_is_abi_compatible`).
- Treat model files as untrusted input in service contexts (sandbox, size limits, resource guards).
- Keep `check_abi_contract.sh` in CI if you publish FFI artifacts.
## Versioning and stability policy
- Rust crate API should follow semantic versioning for public surface changes.
- C ABI changes should be treated as compatibility-sensitive and version-gated.
- Model format changes (`RMD1`) should be versioned explicitly and decoded defensively.
- Breaking changes should be documented in release notes and migration guidance.
## Project validation scripts
- [scripts/check_abi_contract.sh](scripts/check_abi_contract.sh): validates expected ABI symbols
- [scripts/prod_ready_check.sh](scripts/prod_ready_check.sh): broad production-style checks
Note: `prod_ready_check.sh` references optional wrapper ecosystems (`wrappers/python`, `wrappers/javascript`, `wrappers/java`, `wrappers/cpp`) and related tooling.
## Subtleties and design constraints
These are important, non-obvious project subtleties:
1. **`no_std` core behavior**
The crate is intentionally low-level and optimized for explicit runtime control.
2. **Dual format domain (`RMD1` and `RNN\0`)**
Dense serialization and runtime blob parsing are separate concerns and must be selected deliberately per pipeline.
3. **Explicit scratch management**
Inference APIs rely on caller-allocated buffers. This is by design for deterministic memory behavior.
4. **Strict range validation**
Layer offsets, dimensions, and capacities are validated before execution to prevent unsafe indexing paths.
5. **FFI ABI contract stability matters**
Any C ABI change must stay synchronized between [src/ffi_api](src/ffi_api) and [include/rnn_ffi.h](include/rnn_ffi.h).
6. **Repository currently includes broad domain modules**
The crate is not a tiny single-purpose dense runner; it is a wide NN systems toolbox.
## Project validation scripts
- [scripts/check_abi_contract.sh](scripts/check_abi_contract.sh): validates expected ABI symbols
- [scripts/prod_ready_check.sh](scripts/prod_ready_check.sh): broad production-style checks
Note: `prod_ready_check.sh` references optional wrapper ecosystems (`wrappers/python`, `wrappers/javascript`, `wrappers/java`, `wrappers/cpp`) and related tooling.
## Testing status
As requested for this repository:
- no in-repo unit-test focus is currently documented here,
- a dedicated `std` wrapper crate is planned,
- all unit tests are intended to be centralized in that wrapper.
## FAQ
### Why `no_std`?
To keep the core deterministic and portable for constrained/native runtimes.
### Why both `RMD1` and `RNN\0` paths?
They represent two format domains in the repository (dense serialization vs runtime parser utilities). Keep pipeline usage explicit.
### Why a separate `std` wrapper for unit tests?
To keep this core focused on runtime/format/FFI behavior while enabling richer testing ergonomics in a host-friendly crate.
### Can I use this on Windows/Linux/macOS?
Yes. The crate and FFI flow are designed for all three platforms with standard Rust + native toolchains.
## Production checklist
- [ ] Build release artifacts (`cargo build --release`)
- [ ] Validate ABI contract (`scripts/check_abi_contract.sh`)
- [ ] Generate and verify sample model (`examples/generate_sample_model.rs`)
- [ ] Verify FFI lifecycle in your host runtime (create/run/destroy)
- [ ] Apply resource limits and input validation for model loading
- [ ] Track runtime budgets (memory/FLOPs/throughput) before deployment
## Contributing
Contributions are welcome.
Suggested local checks:
```bash
cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo build --release
```
For major changes, open an issue first with:
- scope,
- impacted modules,
- compatibility expectations.
## License
MIT.
See [LICENSE](LICENSE).