native_neural_network 0.1.1

Lib no_std Rust for native neural network (.rnn)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
# rnn

![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)
![Platform: Linux%20%7C%20macOS%20%7C%20Windows](https://img.shields.io/badge/Platform-Linux%20%7C%20macOS%20%7C%20Windows-blue)
![Rust: stable](https://img.shields.io/badge/Rust-stable-orange)

Quick links: [Overview](#what-this-project-does) · [Architecture](#complete-module-reference) · [Format](#binary-dense-format-rmd1-details) · [FFI](#ffi-integration-lifecycle) · [Compatibility](#compatibility-matrix) · [Production](#production-checklist)

`rnn` is a low-level Rust neural-network core built around explicit memory control, binary model formats, and FFI interoperability.

It is designed for native/embedded-style workflows where you want to control:
- how model bytes are created,
- how buffers are allocated,
- how inference is executed,
- and how the same core is reused across Rust and non-Rust runtimes.

## Table of Contents

- [What this project does]#what-this-project-does
- [Why the generated neural network exists]#why-the-generated-neural-network-exists
- [Schema of the generated neural network]#schema-of-the-generated-neural-network
- [Real drawing of the generated network (actual sample values)]#real-drawing-of-the-generated-network-actual-sample-values
- [Conceptual schema of networks built by the library]#conceptual-schema-of-networks-built-by-the-library
- [Conceptual schema of model construction]#conceptual-schema-of-model-construction
- [How this network is created (exact pipeline)]#how-this-network-is-created-exact-pipeline
- [Binary dense format (`RMD1`) details]#binary-dense-format-rmd1-details
- [RMD1 binary layout (concise spec)]#rmd1-binary-layout-concise-spec
- [Runtime parser path (`RNN\0`) and format split]#runtime-parser-path-rnn0-and-format-split
- [Core inference execution model]#core-inference-execution-model
- [Complete module reference]#complete-module-reference
- [Public API groups (selected)]#public-api-groups-selected
- [Compatibility matrix]#compatibility-matrix
- [Build and artifacts]#build-and-artifacts
- [Generate and validate a sample `.rnn`]#generate-and-validate-a-sample-rnn
- [FFI integration lifecycle]#ffi-integration-lifecycle
- [Performance notes]#performance-notes
- [Security and safety notes]#security-and-safety-notes
- [Versioning and stability policy]#versioning-and-stability-policy
- [Project validation scripts]#project-validation-scripts
- [Testing status]#testing-status
- [FAQ]#faq
- [Production checklist]#production-checklist
- [Contributing]#contributing
- [License]#license

## What this project does

This project provides end-to-end building blocks to:

1. Define dense network topology and layer specs
2. Validate parameter counts and index ranges
3. Serialize models into compact binary payloads
4. Deserialize/validate payloads safely
5. Run deterministic inference with caller-provided scratch buffers
6. Expose the same runtime through a C ABI

In addition to dense flow, the crate includes modules for attention, KV cache, RoPE, MoE routing, quantization, sampling, beam search, convolutions, normalization, and profiling/runtime estimation.

## Why the generated neural network exists

The sample generator in [examples/generate_sample_model.rs](examples/generate_sample_model.rs) exists to provide a deterministic, minimal artifact used for:
- format validation,
- API smoke checks,
- FFI integration checks,
- cross-language consistency checks.

It generates a tiny dense model with:
- topology: `[2, 1]`
- weights: `[2.0, -1.0]`
- bias: `[0.5]`
- activation: `Identity`

So the output is:

$$
y = 2.0 \cdot x_0 - 1.0 \cdot x_1 + 0.5
$$

This tiny model is intentionally simple so behavior is easy to verify in every language binding.

## Schema of the generated neural network

```mermaid
flowchart LR
  X0((x0)) --> N[Dense neuron]
  X1((x1)) --> N
  B((bias=0.5)) --> N
  N --> Y((y))
```

Parameter mapping for this sample:
- `w0 = 2.0` applied to `x0`
- `w1 = -1.0` applied to `x1`
- `b = 0.5`
- activation = `Identity`

## Real drawing of the generated network (actual sample values)

This is the exact neuron-level network generated by [examples/generate_sample_model.rs](examples/generate_sample_model.rs):

```mermaid
graph LR
   x0((Input x0)) -- "w0 = +2.0" --> n1((Neuron n1))
   x1((Input x1)) -- "w1 = -1.0" --> n1
   b((Bias +0.5)) --> n1
   n1 -- "Identity" --> y((Output y))
```

Operationally, the neuron computes:

$$
z = (2.0 \cdot x_0) + (-1.0 \cdot x_1) + 0.5
$$

Because the output activation is `Identity`, the final output is:

$$
y = z
$$

So for this generated sample model:

$$
y = 2.0 \cdot x_0 - x_1 + 0.5
$$

## Conceptual schema of networks built by the library

Beyond the tiny sample model, the core dense path implemented by this crate is conceptually a feed-forward stack of dense layers:

```mermaid
flowchart LR
   I[Input vector x] --> L1[Dense Layer 1\nW1 x + b1\nActivation a1]
   L1 --> L2[Dense Layer 2\nW2 h1 + b2\nActivation a2]
   L2 --> L3[... Optional hidden layers ...]
   L3 --> O[Output layer\nWn h(n-1) + bn\nOutput activation]
```

Each dense layer is represented internally with:
- `input_size`
- `output_size`
- `weight_offset`
- `bias_offset`
- `activation`

Those descriptors are chained and validated before execution (`LayerPlan::validate`).

## Conceptual schema of model construction

The dense model creation flow is explicit and deterministic:

```mermaid
flowchart TD
   T[Topology\nexample: 2 -> 1 or 8 -> 16 -> 4] --> S[Build dense layer specs\ninput/output sizes + offsets + activations]
   P[Weights + Biases] --> S
   S --> V[Range/count validation\nweights_len and biases_len checks]
   V --> E[Encode binary model\nRMD1 header + layer metadata + tensors]
   E --> F[.rnn file payload]
   F --> D[Decode + validate at runtime]
   D --> R[Run inference with explicit scratch buffers]
```

Why this design:
- predictable memory behavior (no hidden runtime allocations in core path),
- strict structural checks before compute,
- straightforward interop with FFI consumers.

## How this network is created (exact pipeline)

### Step 1: Topology and parameters
- `topology = [2, 1]`
- user-provided `weights`, `biases`

### Step 2: Build layer specs
`build_dense_specs_from_layers` computes for each layer:
- `input_size`, `output_size`
- `weight_offset`, `bias_offset`
- activation choice (hidden vs output)

It also validates consistency with total weights/biases.

### Step 3: Encode binary payload
`encode_dense_model_v1` writes:
- magic/version/header
- layer metadata
- packed weights
- packed biases

### Step 4: Persist bytes
The example writes the result as a `.rnn` file.

### Step 5: Runtime consumption
At inference time:
- `rnn_required_dense_from_bytes_v1` inspects required counts
- `decode_dense_model_v1` reconstructs layer specs/parameters
- `forward_dense_plan` executes with caller scratch buffers

## Binary dense format (`RMD1`) details

Dense format helpers are in [src/model_format](src/model_format) and [src/rnn_api](src/rnn_api).

Key characteristics:
- Magic: `RMD1`
- Versioned header
- Layer metadata contains input/output sizes, offsets, activation id
- All critical ranges are validated before use
- Decode fails on truncation, bad version/magic, invalid offsets, or capacity mismatch

This gives a strict producer/consumer contract for dense models.

## RMD1 binary layout (concise spec)

Dense `RMD1` payload layout used by `model_format`:

- Header (20 bytes total):
   - `magic` (4 bytes): `RMD1`
   - `version` (u16)
   - `flags` (u16, currently reserved)
   - `layer_count` (u32)
   - `weights_len` (u32)
   - `biases_len` (u32)
- Layer metadata array (`layer_count` entries, 20 bytes each):
   - `input_size` (u32)
   - `output_size` (u32)
   - `weight_offset` (u32)
   - `bias_offset` (u32)
   - `activation` (u8)
   - `reserved` (3 bytes)
- Weights payload (`weights_len * 4` bytes, f32 little-endian)
- Biases payload (`biases_len * 4` bytes, f32 little-endian)

Validation guarantees include:
- non-zero dimensions,
- checked offset arithmetic,
- bounds checks against tensor payload lengths,
- truncation/version/magic checks at decode.

## Runtime parser path (`RNN\0`) and format split

The repository also contains parser utilities in [src/rnn_format](src/rnn_format) with `RNN\0` magic.

So there are two format domains in the project:
- Dense model serialization path (`RMD1`)
- Runtime blob parser path (`RNN\0`)

This is intentional in code, but requires clear pipeline discipline in production.

## Core inference execution model

Dense execution path is explicit and buffer-oriented:
- Validate plan and shape chain
- Compute scratch requirement from max width and batch size
- Use two alternating scratch lanes for layer-by-layer forward pass
- Copy final lane into output buffer

This avoids hidden execution state and keeps runtime behavior predictable.

## Complete module reference

### Core execution
- `network`: network-level checks and stats
- `layers`: layer descriptors, chaining/range validation, topology→spec conversion
- `engine`: dense forward kernels, scratch sizing, shape checks
- `inference`: batch forward wrappers, stable softmax and logits helpers
- `runtime`: memory/flops/throughput/budget estimators
- `model_config`: predefined config helpers

### Tensor and numerics
- `tensor`: tensor views, indexing, layout checks
- `scratch`: temporary memory helpers
- `activations`: activation kinds and vector application
- `normalization`: layer norm / RMS norm
- `quantization`: i8/f32 quant/dequant and mixed matmul
- `math` (in [src/lib.rs]src/lib.rs): no-std-friendly approximations

### Training-adjacent
- `losses`: loss and reduction logic
- `metrics`: MSE/MAE/accuracy/argmax and running means
- `gradients`: norm, clipping, finite checks
- `optimizers`: optimizer update paths
- `schedulers`: LR scheduling
- `trainer`: SGD-oriented step helpers
- `initializers`: parameter count/init helpers

### Transformer-style blocks
- `attention`: scaled dot-product attention + masks/shapes
- `kv_cache`: KV cache views/errors
- `rope`: rotary position embedding application
- `sampling`: temperature/top-k/top-p sampling primitives
- `beam_search`: beam selection utilities
- `moe`: top-1 gating and routing
- `embeddings`: embedding gather and tied projection
- `lora`: LoRA delta application

### Spatial/specialized operators
- `conv3d`: 3D convolution and compatibility checks
- `conv5d`: 5D convolution forward/backward
- `sphere5d`: 5D sphere structures/helpers
- `batching`: padding and mask generation

### Formats and interop
- `model_format`: dense model encoding/decoding (`RMD1`)
- `rnn_api`: high-level dense lifecycle APIs
- `rnn_format`: runtime blob parser (`RNN\0`)
- `ffi_api`: C ABI implementation
- `public_api`: re-exported public surface
- `crypto`: hashing/integrity helpers
- `profiler`: operation counting helpers

### Legacy note
- `embedings` exists as a legacy spelling path in repository history/structure.

## Public API groups (selected)

The crate re-exports many symbols through [src/public_api.rs](src/public_api.rs).

Examples by category:
- Dense lifecycle: `rnn_required_dense_from_bytes_v1`, `rnn_pack_dense_v1`, `rnn_run_dense_v1`
- Format: `encode_dense_model_v1`, `decode_dense_model_v1`, `encoded_size_v1`
- Inference ops: `forward_dense_batch`, `scaled_dot_product_attention`, `apply_rope_in_place`
- Optimization: `dense_sgd_step`, `apply_optimizer_step`, `clip_by_global_norm`
- Runtime estimates: `estimate_runtime_memory`, `estimate_runtime_flops`, `check_runtime_budget`
- FFI C API: model create/run/destroy + ABI checks in [include/rnn_ffi.h]include/rnn_ffi.h

## Compatibility matrix

This project is designed to be compatible across all major desktop/server OSes:

| Platform | Rust crate build | FFI artifacts | Notes |
|---|---|---|---|
| Linux | Supported | Supported (`.so`, `.a`) | Primary native flow |
| macOS | Supported | Supported (`.dylib`, `.a`) | Standard clang/ld toolchain |
| Windows | Supported | Supported (`.dll`, `.lib`) | MSVC/MinGW depending on toolchain |

General requirements:
- Rust stable toolchain
- C/C++ toolchain when consuming FFI outputs
- Platform-specific linker/runtime setup for shared libraries

## Build and artifacts

Build:

```bash
cargo build
cargo build --release
```

With current crate config, release builds can emit Rust + native artifacts according to platform/toolchain (`rlib`, `cdylib`, `staticlib`).

## Generate and validate a sample `.rnn`

Generate:

```bash
cargo run --example generate_sample_model -- /tmp/sample.rnn
```

Sanity-check:

```bash
ls -lh /tmp/sample.rnn
xxd -l 4 /tmp/sample.rnn
```

Expected dense header bytes correspond to `RMD1`.

## FFI integration lifecycle

C header: [include/rnn_ffi.h](include/rnn_ffi.h)

Recommended host flow:
1. `rnn_ffi_api_version` / `rnn_ffi_is_abi_compatible`
2. `rnn_ffi_model_create_from_bytes_v1`
3. `rnn_ffi_model_get_info`
4. `rnn_ffi_model_run_dense` or `rnn_ffi_model_run_dense_batch`
5. `rnn_ffi_model_destroy`

## Performance notes

- Dense forward cost is dominated by matrix-vector products per layer.
- For dense stacks, per-sample compute is approximately proportional to:

$$
\sum_{l=1}^{L} (\text{in}_l \times \text{out}_l)
$$

- Batch mode reuses the same plan and alternates scratch lanes for better locality.
- Scratch requirements scale with `batch_size * max_layer_width * 2` in the current engine path.
- Quantization and runtime estimation modules can be used to pre-plan deployment budgets.

## Security and safety notes

- Never trust external model bytes by default.
- Always validate incoming payloads before inference (`required_*` and decode checks).
- Keep ABI checks enabled in cross-language hosts (`rnn_ffi_is_abi_compatible`).
- Treat model files as untrusted input in service contexts (sandbox, size limits, resource guards).
- Keep `check_abi_contract.sh` in CI if you publish FFI artifacts.

## Versioning and stability policy

- Rust crate API should follow semantic versioning for public surface changes.
- C ABI changes should be treated as compatibility-sensitive and version-gated.
- Model format changes (`RMD1`) should be versioned explicitly and decoded defensively.
- Breaking changes should be documented in release notes and migration guidance.

## Project validation scripts

- [scripts/check_abi_contract.sh]scripts/check_abi_contract.sh: validates expected ABI symbols
- [scripts/prod_ready_check.sh]scripts/prod_ready_check.sh: broad production-style checks

Note: `prod_ready_check.sh` references optional wrapper ecosystems (`wrappers/python`, `wrappers/javascript`, `wrappers/java`, `wrappers/cpp`) and related tooling.

## Subtleties and design constraints

These are important, non-obvious project subtleties:

1. **`no_std` core behavior**
   The crate is intentionally low-level and optimized for explicit runtime control.

2. **Dual format domain (`RMD1` and `RNN\0`)**
   Dense serialization and runtime blob parsing are separate concerns and must be selected deliberately per pipeline.

3. **Explicit scratch management**
   Inference APIs rely on caller-allocated buffers. This is by design for deterministic memory behavior.

4. **Strict range validation**
   Layer offsets, dimensions, and capacities are validated before execution to prevent unsafe indexing paths.

5. **FFI ABI contract stability matters**
   Any C ABI change must stay synchronized between [src/ffi_api]src/ffi_api and [include/rnn_ffi.h]include/rnn_ffi.h.

6. **Repository currently includes broad domain modules**
   The crate is not a tiny single-purpose dense runner; it is a wide NN systems toolbox.

## Project validation scripts

- [scripts/check_abi_contract.sh]scripts/check_abi_contract.sh: validates expected ABI symbols
- [scripts/prod_ready_check.sh]scripts/prod_ready_check.sh: broad production-style checks

Note: `prod_ready_check.sh` references optional wrapper ecosystems (`wrappers/python`, `wrappers/javascript`, `wrappers/java`, `wrappers/cpp`) and related tooling.

## Testing status

As requested for this repository:
- no in-repo unit-test focus is currently documented here,
- a dedicated `std` wrapper crate is planned,
- all unit tests are intended to be centralized in that wrapper.

## FAQ

### Why `no_std`?
To keep the core deterministic and portable for constrained/native runtimes.

### Why both `RMD1` and `RNN\0` paths?
They represent two format domains in the repository (dense serialization vs runtime parser utilities). Keep pipeline usage explicit.

### Why a separate `std` wrapper for unit tests?
To keep this core focused on runtime/format/FFI behavior while enabling richer testing ergonomics in a host-friendly crate.

### Can I use this on Windows/Linux/macOS?
Yes. The crate and FFI flow are designed for all three platforms with standard Rust + native toolchains.

## Production checklist

- [ ] Build release artifacts (`cargo build --release`)
- [ ] Validate ABI contract (`scripts/check_abi_contract.sh`)
- [ ] Generate and verify sample model (`examples/generate_sample_model.rs`)
- [ ] Verify FFI lifecycle in your host runtime (create/run/destroy)
- [ ] Apply resource limits and input validation for model loading
- [ ] Track runtime budgets (memory/FLOPs/throughput) before deployment

## Contributing

Contributions are welcome.

Suggested local checks:

```bash
cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo build --release
```

For major changes, open an issue first with:
- scope,
- impacted modules,
- compatibility expectations.

## License

MIT.

See [LICENSE](LICENSE).