Expand description
§rnn
Quick links: Overview · Architecture · Format · FFI · Compatibility · Production
rnn is a low-level Rust neural-network core built around explicit memory control, binary model formats, and FFI interoperability.
It is designed for native/embedded-style workflows where you want to control:
- how model bytes are created,
- how buffers are allocated,
- how inference is executed,
- and how the same core is reused across Rust and non-Rust runtimes.
§Table of Contents
- What this project does
- Why the generated neural network exists
- Schema of the generated neural network
- Real drawing of the generated network (actual sample values)
- Conceptual schema of networks built by the library
- Conceptual schema of model construction
- How this network is created (exact pipeline)
- Binary dense format (
RMD1) details - RMD1 binary layout (concise spec)
- Runtime parser path (
RNN\0) and format split - Core inference execution model
- Complete module reference
- Public API groups (selected)
- Compatibility matrix
- Build and artifacts
- Generate and validate a sample
.rnn - FFI integration lifecycle
- Performance notes
- Security and safety notes
- Versioning and stability policy
- Project validation scripts
- Testing status
- FAQ
- Production checklist
- Contributing
- License
§What this project does
This project provides end-to-end building blocks to:
- Define dense network topology and layer specs
- Validate parameter counts and index ranges
- Serialize models into compact binary payloads
- Deserialize/validate payloads safely
- Run deterministic inference with caller-provided scratch buffers
- Expose the same runtime through a C ABI
In addition to dense flow, the crate includes modules for attention, KV cache, RoPE, MoE routing, quantization, sampling, beam search, convolutions, normalization, and profiling/runtime estimation.
§Why the generated neural network exists
The sample generator in examples/generate_sample_model.rs exists to provide a deterministic, minimal artifact used for:
- format validation,
- API smoke checks,
- FFI integration checks,
- cross-language consistency checks.
It generates a tiny dense model with:
- topology:
[2, 1] - weights:
[2.0, -1.0] - bias:
[0.5] - activation:
Identity
So the output is:
$$ y = 2.0 \cdot x_0 - 1.0 \cdot x_1 + 0.5 $$
This tiny model is intentionally simple so behavior is easy to verify in every language binding.
§Schema of the generated neural network
flowchart LR
X0((x0)) --> N[Dense neuron]
X1((x1)) --> N
B((bias=0.5)) --> N
N --> Y((y))Parameter mapping for this sample:
w0 = 2.0applied tox0w1 = -1.0applied tox1b = 0.5- activation =
Identity
§Real drawing of the generated network (actual sample values)
This is the exact neuron-level network generated by examples/generate_sample_model.rs:
graph LR
x0((Input x0)) -- "w0 = +2.0" --> n1((Neuron n1))
x1((Input x1)) -- "w1 = -1.0" --> n1
b((Bias +0.5)) --> n1
n1 -- "Identity" --> y((Output y))Operationally, the neuron computes:
$$ z = (2.0 \cdot x_0) + (-1.0 \cdot x_1) + 0.5 $$
Because the output activation is Identity, the final output is:
$$ y = z $$
So for this generated sample model:
$$ y = 2.0 \cdot x_0 - x_1 + 0.5 $$
§Conceptual schema of networks built by the library
Beyond the tiny sample model, the core dense path implemented by this crate is conceptually a feed-forward stack of dense layers:
flowchart LR
I[Input vector x] --> L1[Dense Layer 1\nW1 x + b1\nActivation a1]
L1 --> L2[Dense Layer 2\nW2 h1 + b2\nActivation a2]
L2 --> L3[... Optional hidden layers ...]
L3 --> O[Output layer\nWn h(n-1) + bn\nOutput activation]Each dense layer is represented internally with:
input_sizeoutput_sizeweight_offsetbias_offsetactivation
Those descriptors are chained and validated before execution (LayerPlan::validate).
§Conceptual schema of model construction
The dense model creation flow is explicit and deterministic:
flowchart TD
T[Topology\nexample: 2 -> 1 or 8 -> 16 -> 4] --> S[Build dense layer specs\ninput/output sizes + offsets + activations]
P[Weights + Biases] --> S
S --> V[Range/count validation\nweights_len and biases_len checks]
V --> E[Encode binary model\nRMD1 header + layer metadata + tensors]
E --> F[.rnn file payload]
F --> D[Decode + validate at runtime]
D --> R[Run inference with explicit scratch buffers]Why this design:
- predictable memory behavior (no hidden runtime allocations in core path),
- strict structural checks before compute,
- straightforward interop with FFI consumers.
§How this network is created (exact pipeline)
§Step 1: Topology and parameters
topology = [2, 1]- user-provided
weights,biases
§Step 2: Build layer specs
build_dense_specs_from_layers computes for each layer:
input_size,output_sizeweight_offset,bias_offset- activation choice (hidden vs output)
It also validates consistency with total weights/biases.
§Step 3: Encode binary payload
encode_dense_model_v1 writes:
- magic/version/header
- layer metadata
- packed weights
- packed biases
§Step 4: Persist bytes
The example writes the result as a .rnn file.
§Step 5: Runtime consumption
At inference time:
rnn_required_dense_from_bytes_v1inspects required countsdecode_dense_model_v1reconstructs layer specs/parametersforward_dense_planexecutes with caller scratch buffers
§Binary dense format (RMD1) details
Dense format helpers are in src/model_format and src/rnn_api.
Key characteristics:
- Magic:
RMD1 - Versioned header
- Layer metadata contains input/output sizes, offsets, activation id
- All critical ranges are validated before use
- Decode fails on truncation, bad version/magic, invalid offsets, or capacity mismatch
This gives a strict producer/consumer contract for dense models.
§RMD1 binary layout (concise spec)
Dense RMD1 payload layout used by model_format:
- Header (20 bytes total):
magic(4 bytes):RMD1version(u16)flags(u16, currently reserved)layer_count(u32)weights_len(u32)biases_len(u32)
- Layer metadata array (
layer_countentries, 20 bytes each):input_size(u32)output_size(u32)weight_offset(u32)bias_offset(u32)activation(u8)reserved(3 bytes)
- Weights payload (
weights_len * 4bytes, f32 little-endian) - Biases payload (
biases_len * 4bytes, f32 little-endian)
Validation guarantees include:
- non-zero dimensions,
- checked offset arithmetic,
- bounds checks against tensor payload lengths,
- truncation/version/magic checks at decode.
§Runtime parser path (RNN\0) and format split
The repository also contains parser utilities in src/rnn_format with RNN\0 magic.
So there are two format domains in the project:
- Dense model serialization path (
RMD1) - Runtime blob parser path (
RNN\0)
This is intentional in code, but requires clear pipeline discipline in production.
§Core inference execution model
Dense execution path is explicit and buffer-oriented:
- Validate plan and shape chain
- Compute scratch requirement from max width and batch size
- Use two alternating scratch lanes for layer-by-layer forward pass
- Copy final lane into output buffer
This avoids hidden execution state and keeps runtime behavior predictable.
§Complete module reference
§Core execution
network: network-level checks and statslayers: layer descriptors, chaining/range validation, topology→spec conversionengine: dense forward kernels, scratch sizing, shape checksinference: batch forward wrappers, stable softmax and logits helpersruntime: memory/flops/throughput/budget estimatorsmodel_config: predefined config helpers
§Tensor and numerics
tensor: tensor views, indexing, layout checksscratch: temporary memory helpersactivations: activation kinds and vector applicationnormalization: layer norm / RMS normquantization: i8/f32 quant/dequant and mixed matmulmath(in src/lib.rs): no-std-friendly approximations
§Training-adjacent
losses: loss and reduction logicmetrics: MSE/MAE/accuracy/argmax and running meansgradients: norm, clipping, finite checksoptimizers: optimizer update pathsschedulers: LR schedulingtrainer: SGD-oriented step helpersinitializers: parameter count/init helpers
§Transformer-style blocks
attention: scaled dot-product attention + masks/shapeskv_cache: KV cache views/errorsrope: rotary position embedding applicationsampling: temperature/top-k/top-p sampling primitivesbeam_search: beam selection utilitiesmoe: top-1 gating and routingembeddings: embedding gather and tied projectionlora: LoRA delta application
§Spatial/specialized operators
conv3d: 3D convolution and compatibility checksconv5d: 5D convolution forward/backwardsphere5d: 5D sphere structures/helpersbatching: padding and mask generation
§Formats and interop
model_format: dense model encoding/decoding (RMD1)rnn_api: high-level dense lifecycle APIsrnn_format: runtime blob parser (RNN\0)ffi_api: C ABI implementationpublic_api: re-exported public surfacecrypto: hashing/integrity helpersprofiler: operation counting helpers
§Legacy note
embedingsexists as a legacy spelling path in repository history/structure.
§Public API groups (selected)
The crate re-exports many symbols through src/public_api.rs.
Examples by category:
- Dense lifecycle:
rnn_required_dense_from_bytes_v1,rnn_pack_dense_v1,rnn_run_dense_v1 - Format:
encode_dense_model_v1,decode_dense_model_v1,encoded_size_v1 - Inference ops:
forward_dense_batch,scaled_dot_product_attention,apply_rope_in_place - Optimization:
dense_sgd_step,apply_optimizer_step,clip_by_global_norm - Runtime estimates:
estimate_runtime_memory,estimate_runtime_flops,check_runtime_budget - FFI C API: model create/run/destroy + ABI checks in include/rnn_ffi.h
§Compatibility matrix
This project is designed to be compatible across all major desktop/server OSes:
| Platform | Rust crate build | FFI artifacts | Notes |
|---|---|---|---|
| Linux | Supported | Supported (.so, .a) | Primary native flow |
| macOS | Supported | Supported (.dylib, .a) | Standard clang/ld toolchain |
| Windows | Supported | Supported (.dll, .lib) | MSVC/MinGW depending on toolchain |
General requirements:
- Rust stable toolchain
- C/C++ toolchain when consuming FFI outputs
- Platform-specific linker/runtime setup for shared libraries
§Build and artifacts
Build:
cargo build
cargo build --releaseWith current crate config, release builds can emit Rust + native artifacts according to platform/toolchain (rlib, cdylib, staticlib).
§Generate and validate a sample .rnn
Generate:
cargo run --example generate_sample_model -- /tmp/sample.rnnSanity-check:
ls -lh /tmp/sample.rnn
xxd -l 4 /tmp/sample.rnnExpected dense header bytes correspond to RMD1.
§FFI integration lifecycle
C header: include/rnn_ffi.h
Recommended host flow:
rnn_ffi_api_version/rnn_ffi_is_abi_compatiblernn_ffi_model_create_from_bytes_v1rnn_ffi_model_get_infornn_ffi_model_run_denseorrnn_ffi_model_run_dense_batchrnn_ffi_model_destroy
§Performance notes
- Dense forward cost is dominated by matrix-vector products per layer.
- For dense stacks, per-sample compute is approximately proportional to:
$$ \sum_{l=1}^{L} (\text{in}_l \times \text{out}_l) $$
- Batch mode reuses the same plan and alternates scratch lanes for better locality.
- Scratch requirements scale with
batch_size * max_layer_width * 2in the current engine path. - Quantization and runtime estimation modules can be used to pre-plan deployment budgets.
§Security and safety notes
- Never trust external model bytes by default.
- Always validate incoming payloads before inference (
required_*and decode checks). - Keep ABI checks enabled in cross-language hosts (
rnn_ffi_is_abi_compatible). - Treat model files as untrusted input in service contexts (sandbox, size limits, resource guards).
- Keep
check_abi_contract.shin CI if you publish FFI artifacts.
§Versioning and stability policy
- Rust crate API should follow semantic versioning for public surface changes.
- C ABI changes should be treated as compatibility-sensitive and version-gated.
- Model format changes (
RMD1) should be versioned explicitly and decoded defensively. - Breaking changes should be documented in release notes and migration guidance.
§Project validation scripts
- scripts/check_abi_contract.sh: validates expected ABI symbols
- scripts/prod_ready_check.sh: broad production-style checks
Note: prod_ready_check.sh references optional wrapper ecosystems (wrappers/python, wrappers/javascript, wrappers/java, wrappers/cpp) and related tooling.
§Subtleties and design constraints
These are important, non-obvious project subtleties:
-
no_stdcore behavior The crate is intentionally low-level and optimized for explicit runtime control. -
Dual format domain (
RMD1andRNN\0) Dense serialization and runtime blob parsing are separate concerns and must be selected deliberately per pipeline. -
Explicit scratch management Inference APIs rely on caller-allocated buffers. This is by design for deterministic memory behavior.
-
Strict range validation Layer offsets, dimensions, and capacities are validated before execution to prevent unsafe indexing paths.
-
FFI ABI contract stability matters Any C ABI change must stay synchronized between src/ffi_api and include/rnn_ffi.h.
-
Repository currently includes broad domain modules The crate is not a tiny single-purpose dense runner; it is a wide NN systems toolbox.
§Project validation scripts
- scripts/check_abi_contract.sh: validates expected ABI symbols
- scripts/prod_ready_check.sh: broad production-style checks
Note: prod_ready_check.sh references optional wrapper ecosystems (wrappers/python, wrappers/javascript, wrappers/java, wrappers/cpp) and related tooling.
§Testing status
As requested for this repository:
- no in-repo unit-test focus is currently documented here,
- a dedicated
stdwrapper crate is planned, - all unit tests are intended to be centralized in that wrapper.
§FAQ
§Why no_std?
To keep the core deterministic and portable for constrained/native runtimes.
§Why both RMD1 and RNN\0 paths?
They represent two format domains in the repository (dense serialization vs runtime parser utilities). Keep pipeline usage explicit.
§Why a separate std wrapper for unit tests?
To keep this core focused on runtime/format/FFI behavior while enabling richer testing ergonomics in a host-friendly crate.
§Can I use this on Windows/Linux/macOS?
Yes. The crate and FFI flow are designed for all three platforms with standard Rust + native toolchains.
§Production checklist
-
Build release artifacts (
cargo build --release) -
Validate ABI contract (
scripts/check_abi_contract.sh) -
Generate and verify sample model (
examples/generate_sample_model.rs) - Verify FFI lifecycle in your host runtime (create/run/destroy)
- Apply resource limits and input validation for model loading
- Track runtime budgets (memory/FLOPs/throughput) before deployment
§Contributing
Contributions are welcome.
Suggested local checks:
cargo fmt --all
cargo clippy --all-targets -- -D warnings
cargo build --releaseFor major changes, open an issue first with:
- scope,
- impacted modules,
- compatibility expectations.
§License
MIT.
See LICENSE.
Re-exports§
pub use crate::network::NeuralNetwork;pub use crate::network::NetworkStats;pub use crate::network::network_stats;pub use crate::network::validate_network_parts;pub use crate::tensor::TensorView;pub use crate::tensor::tensor_fill;pub use crate::tensor::tensor_scale_in_place;pub use crate::tensor::tensor_add_in_place;pub use crate::scratch::Scratch;pub use crate::rnn_format::parse_rnn_from_bytes;pub use crate::rnn_format::RnnHandle;pub use crate::rnn_format::BlobMeta;pub use crate::rnn_api::RnnApiError;pub use crate::rnn_api::rnn_required_dense_from_topology;pub use crate::rnn_api::rnn_required_dense_from_bytes_v1;pub use crate::rnn_api::rnn_dense_required_buffers;pub use crate::rnn_api::rnn_dense_required_infer_scratch_from_specs;pub use crate::rnn_api::rnn_validate_dense_topology;pub use crate::rnn_api::rnn_validate_dense_counts;pub use crate::rnn_api::rnn_pack_dense_v1;pub use crate::rnn_api::rnn_unpack_dense_v1;pub use crate::rnn_api::rnn_run_dense_v1;pub use crate::crypto::Sha256Ctx;pub use crate::crypto::Sha512Ctx;pub use crate::crypto::sha256_bytes;pub use crate::crypto::sha512_bytes;pub use crate::crypto::digest_to_hex_lower;pub use crate::crypto::constant_time_eq;pub use crate::crypto::verify_sha256;pub use crate::crypto::verify_sha512;pub use crate::conv3d::conv3d_forward;pub use crate::conv3d::conv3d_layout_compatible;pub use crate::conv3d::conv3d_is_compatible;pub use crate::conv5d::conv5d_forward;pub use crate::conv5d::conv5d_backward;pub use crate::sphere5d::Sphere5D;pub use crate::sphere5d::NeuronPoint;pub use crate::sphere5d::SphereError;pub use crate::activations::ActivationKind;pub use crate::layers::LayerSpec;pub use crate::layers::DenseLayerDesc;pub use crate::layers::LayerPlan;pub use crate::layers::LayerError;pub use crate::engine::forward_dense_plan;pub use crate::engine::forward_dense_plan_big_kernel;pub use crate::engine::required_batch_scratch_len;pub use crate::engine::ForwardError;pub use crate::engine::required_single_infer_scratch;pub use crate::engine::validate_forward_io;pub use crate::model_format::encode_dense_model_v1;pub use crate::model_format::decode_dense_model_v1;pub use crate::model_format::encoded_size_v1;pub use crate::model_format::DecodedCounts;pub use crate::model_format::ModelFormatError;pub use crate::losses::LossKind;pub use crate::losses::LossError;pub use crate::losses::loss_and_gradient;pub use crate::losses::reduce_sum;pub use crate::losses::reduce_mean;pub use crate::metrics::MetricError;pub use crate::metrics::mse;pub use crate::metrics::mae;pub use crate::metrics::argmax;pub use crate::metrics::accuracy_top1_from_one_hot;pub use crate::metrics::cross_entropy_from_probabilities;pub use crate::metrics::RunningMean;pub use crate::initializers::InitKind;pub use crate::initializers::InitError;pub use crate::initializers::expected_parameter_counts;pub use crate::initializers::initialize_dense_parameters;pub use crate::inference::InferenceError;pub use crate::inference::softmax_stable;pub use crate::inference::forward_dense_batch;pub use crate::inference::normalize_logits_in_place;pub use crate::inference::argmax_index;pub use crate::trainer::DenseSgdConfig;pub use crate::trainer::TrainError;pub use crate::trainer::required_train_buffer_len;pub use crate::trainer::dense_sgd_step;pub use crate::optimizers::OptimizerKind;pub use crate::optimizers::OptimizerError;pub use crate::optimizers::optimizer_state_len;pub use crate::optimizers::apply_optimizer_step;pub use crate::schedulers::LrSchedule;pub use crate::schedulers::ScheduleError;pub use crate::schedulers::compute_learning_rate;pub use crate::normalization::NormError;pub use crate::normalization::layer_norm_in_place;pub use crate::normalization::layer_norm;pub use crate::normalization::rms_norm_in_place;pub use crate::normalization::rms_norm;pub use crate::attention::AttentionError;pub use crate::attention::AttentionMask;pub use crate::attention::AttentionShape;pub use crate::attention::scaled_dot_product_attention;pub use crate::quantization::QuantError;pub use crate::quantization::quantize_i8_symmetric;pub use crate::quantization::dequantize_i8_symmetric;pub use crate::quantization::matmul_i8_f32;pub use crate::model_config::TransformerConfig;pub use crate::model_config::ConfigError;pub use crate::model_config::tiny_transformer;pub use crate::model_config::small_transformer;pub use crate::model_config::base_transformer;pub use crate::runtime::RuntimeProfile;pub use crate::runtime::RuntimeEstimate;pub use crate::runtime::RuntimeError;pub use crate::runtime::RuntimeFlopsEstimate;pub use crate::runtime::ThroughputEstimate;pub use crate::runtime::BudgetFit;pub use crate::runtime::estimate_runtime_memory;pub use crate::runtime::estimate_runtime_flops;pub use crate::runtime::estimate_tokens_per_second;pub use crate::runtime::check_runtime_budget;pub use crate::runtime::fit_from_estimate;pub use crate::sampling::SamplingError;pub use crate::sampling::softmax_temperature;pub use crate::sampling::argmax_sample;pub use crate::sampling::sample_from_cumulative;pub use crate::sampling::top_k_mask;pub use crate::sampling::top_p_cutoff;pub use crate::kv_cache::KvCacheError;pub use crate::kv_cache::KvCacheView;pub use crate::rope::RopeError;pub use crate::rope::apply_rope_in_place;pub use crate::embeddings::EmbeddingError;pub use crate::embeddings::gather_embeddings;pub use crate::embeddings::tied_output_projection;pub use crate::lora::LoraError;pub use crate::lora::apply_lora_delta;pub use crate::moe::MoeError;pub use crate::moe::top1_gating;pub use crate::moe::route_top1;pub use crate::beam_search::BeamError;pub use crate::beam_search::select_top_beams;pub use crate::gradients::GradientError;pub use crate::gradients::l2_norm;pub use crate::gradients::clip_by_global_norm;pub use crate::gradients::all_finite;pub use crate::batching::BatchError;pub use crate::batching::pad_sequences_u32;pub use crate::batching::make_padding_mask;pub use crate::profiler::OpCounter;