native_neural_network 0.1.1

# rnn

![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)
![Platform: Linux%20%7C%20macOS%20%7C%20Windows](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-blue)
![Rust: stable](https://img.shields.io/badge/Rust-stable-orange)

Quick links: [Overview](#what-this-project-does) · [Architecture](#complete-module-reference) · [Format](#binary-dense-format-rmd1-details) · [FFI](#ffi-integration-lifecycle) · [Compatibility](#compatibility-matrix) · [Production](#production-checklist)

`rnn` is a low-level Rust neural-network core built around explicit memory control, binary model formats, and FFI interoperability.

It is designed for native/embedded-style workflows where you want to control:
- how model bytes are created,
- how buffers are allocated,
- how inference is executed,
- and how the same core is reused across Rust and non-Rust runtimes.

## Table of Contents

- [What this project does](#what-this-project-does)
- [Why the generated neural network exists](#why-the-generated-neural-network-exists)
- [Schema of the generated neural network](#schema-of-the-generated-neural-network)
- [Real drawing of the generated network (actual sample values)](#real-drawing-of-the-generated-network-actual-sample-values)
- [Conceptual schema of networks built by the library](#conceptual-schema-of-networks-built-by-the-library)
- [Conceptual schema of model construction](#conceptual-schema-of-model-construction)
- [How this network is created (exact pipeline)](#how-this-network-is-created-exact-pipeline)
- [Binary dense format (`RMD1`) details](#binary-dense-format-rmd1-details)
- [RMD1 binary layout (concise spec)](#rmd1-binary-layout-concise-spec)
- [Runtime parser path (`RNN\0`) and format split](#runtime-parser-path-rnn0-and-format-split)
- [Core inference execution model](#core-inference-execution-model)
- [Complete module reference](#complete-module-reference)
- [Public API groups (selected)](#public-api-groups-selected)
- [Compatibility matrix](#compatibility-matrix)
- [Build and artifacts](#build-and-artifacts)
- [Generate and validate a sample `.rnn`](#generate-and-validate-a-sample-rnn)
- [FFI integration lifecycle](#ffi-integration-lifecycle)
- [Performance notes](#performance-notes)
- [Security and safety notes](#security-and-safety-notes)
- [Versioning and stability policy](#versioning-and-stability-policy)
- [Project validation scripts](#project-validation-scripts)
- [Testing status](#testing-status)
- [FAQ](#faq)
- [Production checklist](#production-checklist)
- [Contributing](#contributing)
- [License](#license)

## What this project does

This project provides end-to-end building blocks to:

1. Define dense network topology and layer specs
2. Validate parameter counts and index ranges
3. Serialize models into compact binary payloads
4. Deserialize/validate payloads safely
5. Run deterministic inference with caller-provided scratch buffers
6. Expose the same runtime through a C ABI

In addition to dense flow, the crate includes modules for attention, KV cache, RoPE, MoE routing, quantization, sampling, beam search, convolutions, normalization, and profiling/runtime estimation.

## Why the generated neural network exists

The sample generator in [examples/generate_sample_model.rs](examples/generate_sample_model.rs) exists to provide a deterministic, minimal artifact used for:
- format validation,
- API smoke checks,
- FFI integration checks,
- cross-language consistency checks.

It generates a tiny dense model with:
- topology: `[2, 1]`
- weights: `[2.0, -1.0]`
- bias: `[0.5]`
- activation: `Identity`

So the output is:

$$
y = 2.0 \cdot x_0 - 1.0 \cdot x_1 + 0.5
$$

This tiny model is intentionally simple so behavior is easy to verify in every language binding.

## Schema of the generated neural network

```mermaid
flowchart LR
  X0((x0)) --> N[Dense neuron]
  X1((x1)) --> N
  B((bias=0.5)) --> N
  N --> Y((y))
```

Parameter mapping for this sample:
- `w0 = 2.0` applied to `x0`
- `w1 = -1.0` applied to `x1`
- `b = 0.5`
- activation = `Identity`

## Real drawing of the generated network (actual sample values)

This is the exact neuron-level network generated by [examples/generate_sample_model.rs](examples/generate_sample_model.rs):

```mermaid
graph LR
   x0((Input x0)) -- "w0 = +2.0" --> n1((Neuron n1))
   x1((Input x1)) -- "w1 = -1.0" --> n1
   b((Bias +0.5)) --> n1
   n1 -- "Identity" --> y((Output y))
```

Operationally, the neuron computes:

$$
z = (2.0 \cdot x_0) + (-1.0 \cdot x_1) + 0.5
$$

Because the output activation is `Identity`, the final output is:

$$
y = z
$$

So for this generated sample model:

$$
y = 2.0 \cdot x_0 - x_1 + 0.5
$$

## Conceptual schema of networks built by the library

Beyond the tiny sample model, the core dense path implemented by this crate is conceptually a feed-forward stack of dense layers:

```mermaid
flowchart LR
   I[Input vector x] --> L1[Dense Layer 1\nW1 x + b1\nActivation a1]
   L1 --> L2[Dense Layer 2\nW2 h1 + b2\nActivation a2]
   L2 --> L3[... Optional hidden layers ...]
   L3 --> O[Output layer\nWn h(n-1) + bn\nOutput activation]
```

Each dense layer is represented internally with:
- `input_size`
- `output_size`
- `weight_offset`
- `bias_offset`
- `activation`

Those descriptors are chained and validated before execution (`LayerPlan::validate`).

## Conceptual schema of model construction

The dense model creation flow is explicit and deterministic:

```mermaid
flowchart TD
   T[Topology\nexample: 2 -> 1 or 8 -> 16 -> 4] --> S[Build dense layer specs\ninput/output sizes + offsets + activations]
   P[Weights + Biases] --> S
   S --> V[Range/count validation\nweights_len and biases_len checks]
   V --> E[Encode binary model\nRMD1 header + layer metadata + tensors]
   E --> F[.rnn file payload]
   F --> D[Decode + validate at runtime]
   D --> R[Run inference with explicit scratch buffers]
```

Why this design:
- predictable memory behavior (no hidden runtime allocations in core path),
- strict structural checks before compute,
- straightforward interop with FFI consumers.

## How this network is created (exact pipeline)

### Step 1: Topology and parameters
- `topology = [2, 1]`
- user-provided `weights`, `biases`

### Step 2: Build layer specs
`build_dense_specs_from_layers` computes for each layer:
- `input_size`, `output_size`
- `weight_offset`, `bias_offset`
- activation choice (hidden vs output)

It also validates consistency with total weights/biases.

### Step 3: Encode binary payload
`encode_dense_model_v1` writes:
- magic/version/header
- layer metadata
- packed weights
- packed biases

### Step 4: Persist bytes
The example writes the result as a `.rnn` file.

### Step 5: Runtime consumption
At inference time:
- `rnn_required_dense_from_bytes_v1` inspects required counts
- `decode_dense_model_v1` reconstructs layer specs/parameters
- `forward_dense_plan` executes with caller scratch buffers

## Binary dense format (`RMD1`) details

Dense format helpers are in [src/model_format](src/model_format) and [src/rnn_api](src/rnn_api).

Key characteristics:
- Magic: `RMD1`
- Versioned header
- Layer metadata contains input/output sizes, offsets, activation id
- All critical ranges are validated before use
- Decode fails on truncation, bad version/magic, invalid offsets, or capacity mismatch

This gives a strict producer/consumer contract for dense models.

## RMD1 binary layout (concise spec)

Dense `RMD1` payload layout used by `model_format`:

- Header (20 bytes total):
   - `magic` (4 bytes): `RMD1`
   - `version` (u16)
   - `flags` (u16, currently reserved)
   - `layer_count` (u32)
   - `weights_len` (u32)
   - `biases_len` (u32)
- Layer metadata array (`layer_count` entries, 20 bytes each):
   - `input_size` (u32)
   - `output_size` (u32)
   - `weight_offset` (u32)
   - `bias_offset` (u32)
   - `activation` (u8)
   - `reserved` (3 bytes)
- Weights payload (`weights_len * 4` bytes, f32 little-endian)
- Biases payload (`biases_len * 4` bytes, f32 little-endian)

Validation guarantees include:
- non-zero dimensions,
- checked offset arithmetic,
- bounds checks against tensor payload lengths,
- truncation/version/magic checks at decode.

## Runtime parser path (`RNN\0`) and format split

The repository also contains parser utilities in [src/rnn_format](src/rnn_format) with `RNN\0` magic.

So there are two format domains in the project:
- Dense model serialization path (`RMD1`)
- Runtime blob parser path (`RNN\0`)

This is intentional in code, but requires clear pipeline discipline in production.

## Core inference execution model

Dense execution path is explicit and buffer-oriented:
- Validate plan and shape chain
- Compute scratch requirement from max width and batch size
- Use two alternating scratch lanes for layer-by-layer forward pass
- Copy final lane into output buffer

This avoids hidden execution state and keeps runtime behavior predictable.

## Complete module reference

### Core execution
- `network`: network-level checks and stats
- `layers`: layer descriptors, chaining/range validation, topology→spec conversion
- `engine`: dense forward kernels, scratch sizing, shape checks
- `inference`: batch forward wrappers, stable softmax and logits helpers
- `runtime`: memory/flops/throughput/budget estimators
- `model_config`: predefined config helpers

### Tensor and numerics
- `tensor`: tensor views, indexing, layout checks
- `scratch`: temporary memory helpers
- `activations`: activation kinds and vector application
- `normalization`: layer norm / RMS norm
- `quantization`: i8/f32 quant/dequant and mixed matmul
- `math` (in [src/lib.rs](src/lib.rs)): no-std-friendly approximations

### Training-adjacent
- `losses`: loss and reduction logic
- `metrics`: MSE/MAE/accuracy/argmax and running means
- `gradients`: norm, clipping, finite checks
- `optimizers`: optimizer update paths
- `schedulers`: LR scheduling
- `trainer`: SGD-oriented step helpers
- `initializers`: parameter count/init helpers

### Transformer-style blocks
- `attention`: scaled dot-product attention + masks/shapes
- `kv_cache`: KV cache views/errors
- `rope`: rotary position embedding application
- `sampling`: temperature/top-k/top-p sampling primitives
- `beam_search`: beam selection utilities
- `moe`: top-1 gating and routing
- `embeddings`: embedding gather and tied projection
- `lora`: LoRA delta application

### Spatial/specialized operators
- `conv3d`: 3D convolution and compatibility checks
- `conv5d`: 5D convolution forward/backward
- `sphere5d`: 5D sphere structures/helpers
- `batching`: padding and mask generation

### Formats and interop
- `model_format`: dense model encoding/decoding (`RMD1`)
- `rnn_api`: high-level dense lifecycle APIs
- `rnn_format`: runtime blob parser (`RNN\0`)
- `ffi_api`: C ABI implementation
- `public_api`: re-exported public surface
- `crypto`: hashing/integrity helpers
- `profiler`: operation counting helpers

### Legacy note
- `embedings` exists as a legacy spelling path in repository history/structure.

## Public API groups (selected)

The crate re-exports many symbols through [src/public_api.rs](src/public_api.rs).

Examples by category:
- Dense lifecycle: `rnn_required_dense_from_bytes_v1`, `rnn_pack_dense_v1`, `rnn_run_dense_v1`
- Format: `encode_dense_model_v1`, `decode_dense_model_v1`, `encoded_size_v1`
- Inference ops: `forward_dense_batch`, `scaled_dot_product_attention`, `apply_rope_in_place`
- Optimization: `dense_sgd_step`, `apply_optimizer_step`, `clip_by_global_norm`
- Runtime estimates: `estimate_runtime_memory`, `estimate_runtime_flops`, `check_runtime_budget`
- FFI C API: model create/run/destroy + ABI checks in [include/rnn_ffi.h](include/rnn_ffi.h)

## Compatibility matrix

This project is designed to be compatible across all major desktop/server OSes:

| Platform | Rust crate build | FFI artifacts | Notes |
|---|---|---|---|
| Linux | Supported | Supported (`.so`, `.a`) | Primary native flow |
| macOS | Supported | Supported (`.dylib`, `.a`) | Standard clang/ld toolchain |
| Windows | Supported | Supported (`.dll`, `.lib`) | MSVC/MinGW depending on toolchain |

General requirements:
- Rust stable toolchain
- C/C++ toolchain when consuming FFI outputs
- Platform-specific linker/runtime setup for shared libraries

## Build and artifacts

Build:

```bash
cargo build
cargo build --release
```

With current crate config, release builds can emit Rust + native artifacts according to platform/toolchain (`rlib`, `cdylib`, `staticlib`).

## Generate and validate a sample `.rnn`

Generate:

```bash
cargo run --example generate_sample_model -- /tmp/sample.rnn
```

Sanity-check:

```bash
ls -lh /tmp/sample.rnn
xxd -l 4 /tmp/sample.rnn
```

Expected dense header bytes correspond to `RMD1`.

## FFI integration lifecycle

C header: [include/rnn_ffi.h](include/rnn_ffi.h)

Recommended host flow:
1. `rnn_ffi_api_version` / `rnn_ffi_is_abi_compatible`
2. `rnn_ffi_model_create_from_bytes_v1`
3. `rnn_ffi_model_get_info`
4. `rnn_ffi_model_run_dense` or `rnn_ffi_model_run_dense_batch`
5. `rnn_ffi_model_destroy`

## Performance notes

- Dense forward cost is dominated by matrix-vector products per layer.
- For dense stacks, per-sample compute is approximately proportional to:

$$
\sum_{l=1}^{L} (\text{in}_l \times \text{out}_l)
$$

- Batch mode reuses the same plan and alternates scratch lanes for better locality.
- Scratch requirements scale with `batch_size * max_layer_width * 2` in the current engine path.
- Quantization and runtime estimation modules can be used to pre-plan deployment budgets.

## Security and safety notes

- Never trust external model bytes by default.
- Always validate incoming payloads before inference (`required_*` and decode checks).
- Keep ABI checks enabled in cross-language hosts (`rnn_ffi_is_abi_compatible`).
- Treat model files as untrusted input in service contexts (sandbox, size limits, resource guards).
- Keep `check_abi_contract.sh` in CI if you publish FFI artifacts.

## Versioning and stability policy

- Rust crate API should follow semantic versioning for public surface changes.
- C ABI changes should be treated as compatibility-sensitive and version-gated.
- Model format changes (`RMD1`) should be versioned explicitly and decoded defensively.
- Breaking changes should be documented in release notes and migration guidance.

## Project validation scripts

- [scripts/check_abi_contract.sh](scripts/check_abi_contract.sh): validates expected ABI symbols
- [scripts/prod_ready_check.sh](scripts/prod_ready_check.sh): broad production-style checks

Note: `prod_ready_check.sh` references optional wrapper ecosystems (`wrappers/python`, `wrappers/javascript`, `wrappers/java`, `wrappers/cpp`) and related tooling.

## Subtleties and design constraints

These are important, non-obvious project subtleties:

1. **`no_std` core behavior**
   The crate is intentionally low-level and optimized for explicit runtime control.

2. **Dual format domain (`RMD1` and `RNN\0`)**
   Dense serialization and runtime blob parsing are separate concerns and must be selected deliberately per pipeline.

3. **Explicit scratch management**
   Inference APIs rely on caller-allocated buffers. This is by design for deterministic memory behavior.

4. **Strict range validation**
   Layer offsets, dimensions, and capacities are validated before execution to prevent unsafe indexing paths.

5. **FFI ABI contract stability matters**
   Any C ABI change must stay synchronized between [src/ffi_api](src/ffi_api) and [include/rnn_ffi.h](include/rnn_ffi.h).

6. **Repository currently includes broad domain modules**
   The crate is not a tiny single-purpose dense runner; it is a wide NN systems toolbox.

## Project validation scripts

- [scripts/check_abi_contract.sh](scripts/check_abi_contract.sh): validates expected ABI symbols
- [scripts/prod_ready_check.sh](scripts/prod_ready_check.sh): broad production-style checks

Note: `prod_ready_check.sh` references optional wrapper ecosystems (`wrappers/python`, `wrappers/javascript`, `wrappers/java`, `wrappers/cpp`) and related tooling.

## Testing status

As requested for this repository:
- no in-repo unit-test focus is currently documented here,
- a dedicated `std` wrapper crate is planned,
- all unit tests are intended to be centralized in that wrapper.

## FAQ

### Why `no_std`?
To keep the core deterministic and portable for constrained/native runtimes.

### Why both `RMD1` and `RNN\0` paths?
They represent two format domains in the repository (dense serialization vs runtime parser utilities). Keep pipeline usage explicit.

### Why a separate `std` wrapper for unit tests?
To keep this core focused on runtime/format/FFI behavior while enabling richer testing ergonomics in a host-friendly crate.

### Can I use this on Windows/Linux/macOS?
Yes. The crate and FFI flow are designed for all three platforms with standard Rust + native toolchains.

## Production checklist

- [ ] Build release artifacts (`cargo build --release`)
- [ ] Validate ABI contract (`scripts/check_abi_contract.sh`)
- [ ] Generate and verify sample model (`examples/generate_sample_model.rs`)
- [ ] Verify FFI lifecycle in your host runtime (create/run/destroy)
- [ ] Apply resource limits and input validation for model loading
- [ ] Track runtime budgets (memory/FLOPs/throughput) before deployment

## Contributing

Contributions are welcome.

Suggested local checks:

```bash
cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo build --release
```

For major changes, open an issue first with:
- scope,
- impacted modules,
- compatibility expectations.

## License

MIT.

See [LICENSE](LICENSE).