tensor-forge 1.0.0

A minimal deterministic compute graph runtime for tensor operations in Rust.
Documentation
  • Coverage
  • 100%
    89 out of 89 items documented36 out of 43 items with examples
  • Size
  • Source code size: 184.46 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 7.81 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 20s Average build duration of successful builds.
  • all releases: 21s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • noah-CAL/tensor-forge
    1 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • noah-CAL

tensor-forge

Crates.io Docs.rs coverage Build Status GitHub License

A minimal, deterministic compute graph runtime for tensor operations in Rust.

tensor-forge is a small, focused runtime project for building and executing tensor compute graphs with deterministic scheduling, graph-level validation, and pluggable kernel dispatch. It is designed to be readable, well-tested, and extensible rather than maximally optimized.

It provides:

  • Graph for constructing directed acyclic tensor compute graphs
  • Executor for deterministic graph execution
  • KernelRegistry for pluggable operation dispatch
  • Tensor as the runtime value type used for inputs and outputs

Highlights

  • Deterministic execution — graph execution order is stable and independent of map iteration order
  • Validated graph construction — operations are shape-checked before execution
  • Pluggable kernels — operation implementations are dispatched through a registry
  • Reusable graphs — graph structure is defined once and executed with runtime input bindings
  • Well-tested and documented — includes unit tests, integration tests, doctests, CI, and runnable examples

Installation

Add this to your project's Cargo.toml:

[dependencies]
tensor-forge = "1.0.0"

In Rust code, import the crate as:

use tensor_forge::{Executor, Graph, KernelRegistry, Tensor};

Quick Example

use tensor_forge::{Executor, Graph, KernelRegistry, Tensor};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut g = Graph::new();

    let a = g.input_node(vec![2, 2]);
    let b = g.input_node(vec![2, 2]);

    let out = g.add(a, b)?;
    g.set_output_node(out)?;

    let a_tensor = Tensor::from_vec(vec![2, 2], vec![1.0, 2.0, 3.0, 4.0])?;
    let b_tensor = Tensor::from_vec(vec![2, 2], vec![10.0, 20.0, 30.0, 40.0])?;

    let exec = Executor::new(KernelRegistry::default());
    let outputs = exec.execute(&g, vec![(a, a_tensor), (b, b_tensor)])?;

    let output = &outputs[&out];
    assert_eq!(output.shape(), &[2, 2]);
    assert_eq!(output.data(), &[11.0, 22.0, 33.0, 44.0]);

    Ok(())
}

Runnable Examples

The examples/ directory includes:

  • addition_graph.rs — smallest complete graph execution example
  • branching_graph.rs — branching graph with multiple operations
  • custom_kernel.rs — defining and registering a custom kernel
  • feedforward_neural_net.rs — programmatic construction of a small feedforward network

Run them with:

cargo run --example addition_graph
cargo run --example branching_graph
cargo run --example custom_kernel
cargo run --example feedforward_neural_net

Current capabilities

tensor-forge currently supports:

  • contiguous row-major tensor storage with Vec<f64>
  • graph construction with explicit input and output nodes
  • shape-checked graph operations
  • deterministic topological ordering
  • kernel dispatch through a default or user-provided registry
  • end-to-end graph execution through Executor
  • custom kernel definition and registration

Scope and tradeoffs

tensor-forge is intentionally narrow in scope. Current tradeoffs include:

  • serial execution only
  • f64 tensors only
  • no GPU backend
  • no graph-level optimization passes yet

These constraints keep the runtime core compact and make the implementation easier to reason about.

Planned improvements

Planned next steps include:

  • parallel execution support
  • additional tensor operations
  • graph-level optimization passes such as dead-node elimination
  • improved evaluation strategies for memory usage and performance

Development

Common commands:

cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
cargo test

Contributing

Contributions, bug reports, and suggestions are welcome. For substantial changes, please open an issue first to discuss the design.

Before submitting a pull request, please make sure formatting, linting, tests, and doctests all pass.

License

This project is licensed under the MIT License.

See LICENSE for details.