native_neural_network 0.1.0

Lib no_std Rust for native neural network (.rnn)
Documentation

rnn

License: MIT Platform: Linux%20%7C%20macOS%20%7C%20Windows Rust: stable

Quick links: Overview · Architecture · Format · FFI · Compatibility · Production

rnn is a low-level Rust neural-network core built around explicit memory control, binary model formats, and FFI interoperability.

It is designed for native/embedded-style workflows where you want to control:

  • how model bytes are created,
  • how buffers are allocated,
  • how inference is executed,
  • and how the same core is reused across Rust and non-Rust runtimes.

Table of Contents

What this project does

This project provides end-to-end building blocks to:

  1. Define dense network topology and layer specs
  2. Validate parameter counts and index ranges
  3. Serialize models into compact binary payloads
  4. Deserialize/validate payloads safely
  5. Run deterministic inference with caller-provided scratch buffers
  6. Expose the same runtime through a C ABI

In addition to dense flow, the crate includes modules for attention, KV cache, RoPE, MoE routing, quantization, sampling, beam search, convolutions, normalization, and profiling/runtime estimation.

Why the generated neural network exists

The sample generator in examples/generate_sample_model.rs exists to provide a deterministic, minimal artifact used for:

  • format validation,
  • API smoke checks,
  • FFI integration checks,
  • cross-language consistency checks.

It generates a tiny dense model with:

  • topology: [2, 1]
  • weights: [2.0, -1.0]
  • bias: [0.5]
  • activation: Identity

So the output is:

$$ y = 2.0 \cdot x_0 - 1.0 \cdot x_1 + 0.5 $$

This tiny model is intentionally simple so behavior is easy to verify in every language binding.

Schema of the generated neural network

flowchart LR
  X0((x0)) --> N[Dense neuron]
  X1((x1)) --> N
  B((bias=0.5)) --> N
  N --> Y((y))

Parameter mapping for this sample:

  • w0 = 2.0 applied to x0
  • w1 = -1.0 applied to x1
  • b = 0.5
  • activation = Identity

Real drawing of the generated network (actual sample values)

This is the exact neuron-level network generated by examples/generate_sample_model.rs:

graph LR
   x0((Input x0)) -- "w0 = +2.0" --> n1((Neuron n1))
   x1((Input x1)) -- "w1 = -1.0" --> n1
   b((Bias +0.5)) --> n1
   n1 -- "Identity" --> y((Output y))

Operationally, the neuron computes:

$$ z = (2.0 \cdot x_0) + (-1.0 \cdot x_1) + 0.5 $$

Because the output activation is Identity, the final output is:

$$ y = z $$

So for this generated sample model:

$$ y = 2.0 \cdot x_0 - x_1 + 0.5 $$

Conceptual schema of networks built by the library

Beyond the tiny sample model, the core dense path implemented by this crate is conceptually a feed-forward stack of dense layers:

flowchart LR
   I[Input vector x] --> L1[Dense Layer 1\nW1 x + b1\nActivation a1]
   L1 --> L2[Dense Layer 2\nW2 h1 + b2\nActivation a2]
   L2 --> L3[... Optional hidden layers ...]
   L3 --> O[Output layer\nWn h(n-1) + bn\nOutput activation]

Each dense layer is represented internally with:

  • input_size
  • output_size
  • weight_offset
  • bias_offset
  • activation

Those descriptors are chained and validated before execution (LayerPlan::validate).

Conceptual schema of model construction

The dense model creation flow is explicit and deterministic:

flowchart TD
   T[Topology\nexample: 2 -> 1 or 8 -> 16 -> 4] --> S[Build dense layer specs\ninput/output sizes + offsets + activations]
   P[Weights + Biases] --> S
   S --> V[Range/count validation\nweights_len and biases_len checks]
   V --> E[Encode binary model\nRMD1 header + layer metadata + tensors]
   E --> F[.rnn file payload]
   F --> D[Decode + validate at runtime]
   D --> R[Run inference with explicit scratch buffers]

Why this design:

  • predictable memory behavior (no hidden runtime allocations in core path),
  • strict structural checks before compute,
  • straightforward interop with FFI consumers.

How this network is created (exact pipeline)

Step 1: Topology and parameters

  • topology = [2, 1]
  • user-provided weights, biases

Step 2: Build layer specs

build_dense_specs_from_layers computes for each layer:

  • input_size, output_size
  • weight_offset, bias_offset
  • activation choice (hidden vs output)

It also validates consistency with total weights/biases.

Step 3: Encode binary payload

encode_dense_model_v1 writes:

  • magic/version/header
  • layer metadata
  • packed weights
  • packed biases

Step 4: Persist bytes

The example writes the result as a .rnn file.

Step 5: Runtime consumption

At inference time:

  • rnn_required_dense_from_bytes_v1 inspects required counts
  • decode_dense_model_v1 reconstructs layer specs/parameters
  • forward_dense_plan executes with caller scratch buffers

Binary dense format (RMD1) details

Dense format helpers are in src/model_format and src/rnn_api.

Key characteristics:

  • Magic: RMD1
  • Versioned header
  • Layer metadata contains input/output sizes, offsets, activation id
  • All critical ranges are validated before use
  • Decode fails on truncation, bad version/magic, invalid offsets, or capacity mismatch

This gives a strict producer/consumer contract for dense models.

RMD1 binary layout (concise spec)

Dense RMD1 payload layout used by model_format:

  • Header (20 bytes total):
    • magic (4 bytes): RMD1
    • version (u16)
    • flags (u16, currently reserved)
    • layer_count (u32)
    • weights_len (u32)
    • biases_len (u32)
  • Layer metadata array (layer_count entries, 20 bytes each):
    • input_size (u32)
    • output_size (u32)
    • weight_offset (u32)
    • bias_offset (u32)
    • activation (u8)
    • reserved (3 bytes)
  • Weights payload (weights_len * 4 bytes, f32 little-endian)
  • Biases payload (biases_len * 4 bytes, f32 little-endian)

Validation guarantees include:

  • non-zero dimensions,
  • checked offset arithmetic,
  • bounds checks against tensor payload lengths,
  • truncation/version/magic checks at decode.

Runtime parser path (RNN\0) and format split

The repository also contains parser utilities in src/rnn_format with RNN\0 magic.

So there are two format domains in the project:

  • Dense model serialization path (RMD1)
  • Runtime blob parser path (RNN\0)

This is intentional in code, but requires clear pipeline discipline in production.

Core inference execution model

Dense execution path is explicit and buffer-oriented:

  • Validate plan and shape chain
  • Compute scratch requirement from max width and batch size
  • Use two alternating scratch lanes for layer-by-layer forward pass
  • Copy final lane into output buffer

This avoids hidden execution state and keeps runtime behavior predictable.

Complete module reference

Core execution

  • network: network-level checks and stats
  • layers: layer descriptors, chaining/range validation, topology→spec conversion
  • engine: dense forward kernels, scratch sizing, shape checks
  • inference: batch forward wrappers, stable softmax and logits helpers
  • runtime: memory/flops/throughput/budget estimators
  • model_config: predefined config helpers

Tensor and numerics

  • tensor: tensor views, indexing, layout checks
  • scratch: temporary memory helpers
  • activations: activation kinds and vector application
  • normalization: layer norm / RMS norm
  • quantization: i8/f32 quant/dequant and mixed matmul
  • math (in src/lib.rs): no-std-friendly approximations

Training-adjacent

  • losses: loss and reduction logic
  • metrics: MSE/MAE/accuracy/argmax and running means
  • gradients: norm, clipping, finite checks
  • optimizers: optimizer update paths
  • schedulers: LR scheduling
  • trainer: SGD-oriented step helpers
  • initializers: parameter count/init helpers

Transformer-style blocks

  • attention: scaled dot-product attention + masks/shapes
  • kv_cache: KV cache views/errors
  • rope: rotary position embedding application
  • sampling: temperature/top-k/top-p sampling primitives
  • beam_search: beam selection utilities
  • moe: top-1 gating and routing
  • embeddings: embedding gather and tied projection
  • lora: LoRA delta application

Spatial/specialized operators

  • conv3d: 3D convolution and compatibility checks
  • conv5d: 5D convolution forward/backward
  • sphere5d: 5D sphere structures/helpers
  • batching: padding and mask generation

Formats and interop

  • model_format: dense model encoding/decoding (RMD1)
  • rnn_api: high-level dense lifecycle APIs
  • rnn_format: runtime blob parser (RNN\0)
  • ffi_api: C ABI implementation
  • public_api: re-exported public surface
  • crypto: hashing/integrity helpers
  • profiler: operation counting helpers

Legacy note

  • embedings exists as a legacy spelling path in repository history/structure.

Public API groups (selected)

The crate re-exports many symbols through src/public_api.rs.

Examples by category:

  • Dense lifecycle: rnn_required_dense_from_bytes_v1, rnn_pack_dense_v1, rnn_run_dense_v1
  • Format: encode_dense_model_v1, decode_dense_model_v1, encoded_size_v1
  • Inference ops: forward_dense_batch, scaled_dot_product_attention, apply_rope_in_place
  • Optimization: dense_sgd_step, apply_optimizer_step, clip_by_global_norm
  • Runtime estimates: estimate_runtime_memory, estimate_runtime_flops, check_runtime_budget
  • FFI C API: model create/run/destroy + ABI checks in include/rnn_ffi.h

Compatibility matrix

This project is designed to be compatible across all major desktop/server OSes:

Platform Rust crate build FFI artifacts Notes
Linux Supported Supported (.so, .a) Primary native flow
macOS Supported Supported (.dylib, .a) Standard clang/ld toolchain
Windows Supported Supported (.dll, .lib) MSVC/MinGW depending on toolchain

General requirements:

  • Rust stable toolchain
  • C/C++ toolchain when consuming FFI outputs
  • Platform-specific linker/runtime setup for shared libraries

Build and artifacts

Build:

cargo build
cargo build --release

With current crate config, release builds can emit Rust + native artifacts according to platform/toolchain (rlib, cdylib, staticlib).

Generate and validate a sample .rnn

Generate:

cargo run --example generate_sample_model -- /tmp/sample.rnn

Sanity-check:

ls -lh /tmp/sample.rnn
xxd -l 4 /tmp/sample.rnn

Expected dense header bytes correspond to RMD1.

FFI integration lifecycle

C header: include/rnn_ffi.h

Recommended host flow:

  1. rnn_ffi_api_version / rnn_ffi_is_abi_compatible
  2. rnn_ffi_model_create_from_bytes_v1
  3. rnn_ffi_model_get_info
  4. rnn_ffi_model_run_dense or rnn_ffi_model_run_dense_batch
  5. rnn_ffi_model_destroy

Performance notes

  • Dense forward cost is dominated by matrix-vector products per layer.
  • For dense stacks, per-sample compute is approximately proportional to:

$$ \sum_{l=1}^{L} (\text{in}_l \times \text{out}_l) $$

  • Batch mode reuses the same plan and alternates scratch lanes for better locality.
  • Scratch requirements scale with batch_size * max_layer_width * 2 in the current engine path.
  • Quantization and runtime estimation modules can be used to pre-plan deployment budgets.

Security and safety notes

  • Never trust external model bytes by default.
  • Always validate incoming payloads before inference (required_* and decode checks).
  • Keep ABI checks enabled in cross-language hosts (rnn_ffi_is_abi_compatible).
  • Treat model files as untrusted input in service contexts (sandbox, size limits, resource guards).
  • Keep check_abi_contract.sh in CI if you publish FFI artifacts.

Versioning and stability policy

  • Rust crate API should follow semantic versioning for public surface changes.
  • C ABI changes should be treated as compatibility-sensitive and version-gated.
  • Model format changes (RMD1) should be versioned explicitly and decoded defensively.
  • Breaking changes should be documented in release notes and migration guidance.

Project validation scripts

Note: prod_ready_check.sh references optional wrapper ecosystems (wrappers/python, wrappers/javascript, wrappers/java, wrappers/cpp) and related tooling.

Subtleties and design constraints

These are important, non-obvious project subtleties:

  1. no_std core behavior The crate is intentionally low-level and optimized for explicit runtime control.

  2. Dual format domain (RMD1 and RNN\0) Dense serialization and runtime blob parsing are separate concerns and must be selected deliberately per pipeline.

  3. Explicit scratch management Inference APIs rely on caller-allocated buffers. This is by design for deterministic memory behavior.

  4. Strict range validation Layer offsets, dimensions, and capacities are validated before execution to prevent unsafe indexing paths.

  5. FFI ABI contract stability matters Any C ABI change must stay synchronized between src/ffi_api and include/rnn_ffi.h.

  6. Repository currently includes broad domain modules The crate is not a tiny single-purpose dense runner; it is a wide NN systems toolbox.

Project validation scripts

Note: prod_ready_check.sh references optional wrapper ecosystems (wrappers/python, wrappers/javascript, wrappers/java, wrappers/cpp) and related tooling.

Testing status

As requested for this repository:

  • no in-repo unit-test focus is currently documented here,
  • a dedicated std wrapper crate is planned,
  • all unit tests are intended to be centralized in that wrapper.

FAQ

Why no_std?

To keep the core deterministic and portable for constrained/native runtimes.

Why both RMD1 and RNN\0 paths?

They represent two format domains in the repository (dense serialization vs runtime parser utilities). Keep pipeline usage explicit.

Why a separate std wrapper for unit tests?

To keep this core focused on runtime/format/FFI behavior while enabling richer testing ergonomics in a host-friendly crate.

Can I use this on Windows/Linux/macOS?

Yes. The crate and FFI flow are designed for all three platforms with standard Rust + native toolchains.

Production checklist

  • Build release artifacts (cargo build --release)
  • Validate ABI contract (scripts/check_abi_contract.sh)
  • Generate and verify sample model (examples/generate_sample_model.rs)
  • Verify FFI lifecycle in your host runtime (create/run/destroy)
  • Apply resource limits and input validation for model loading
  • Track runtime budgets (memory/FLOPs/throughput) before deployment

Contributing

Contributions are welcome.

Suggested local checks:

cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo build --release

For major changes, open an issue first with:

  • scope,
  • impacted modules,
  • compatibility expectations.

License

MIT.

See LICENSE.