tensor-forge
A minimal, deterministic compute graph runtime for tensor operations in Rust.
tensor-forge is a small, focused runtime project for building and executing tensor compute graphs with deterministic scheduling, graph-level validation, and pluggable kernel dispatch. It is designed to be readable, well-tested, and extensible rather than maximally optimized.
It provides:
Graphfor constructing directed acyclic tensor compute graphsExecutorfor deterministic graph executionKernelRegistryfor pluggable operation dispatchTensoras the runtime value type used for inputs and outputs
Highlights
- Deterministic execution — graph execution order is stable and independent of map iteration order
- Validated graph construction — operations are shape-checked before execution
- Pluggable kernels — operation implementations are dispatched through a registry
- Reusable graphs — graph structure is defined once and executed with runtime input bindings
- Well-tested and documented — includes unit tests, integration tests, doctests, CI, and runnable examples
Installation
Add this to your project's Cargo.toml:
[]
= "1.0.0"
In Rust code, import the crate as:
use ;
Quick Example
use ;
Runnable Examples
The examples/ directory includes:
addition_graph.rs— smallest complete graph execution examplebranching_graph.rs— branching graph with multiple operationscustom_kernel.rs— defining and registering a custom kernelfeedforward_neural_net.rs— programmatic construction of a small feedforward network
Run them with:
cargo run --example addition_graph
cargo run --example branching_graph
cargo run --example custom_kernel
cargo run --example feedforward_neural_net
Current capabilities
tensor-forge currently supports:
- contiguous row-major tensor storage with
Vec<f64> - graph construction with explicit input and output nodes
- shape-checked graph operations
- deterministic topological ordering
- kernel dispatch through a default or user-provided registry
- end-to-end graph execution through
Executor - custom kernel definition and registration
Scope and tradeoffs
tensor-forge is intentionally narrow in scope. Current tradeoffs include:
- serial execution only
f64tensors only- no GPU backend
- no graph-level optimization passes yet
These constraints keep the runtime core compact and make the implementation easier to reason about.
Planned improvements
Planned next steps include:
- parallel execution support
- additional tensor operations
- graph-level optimization passes such as dead-node elimination
- improved evaluation strategies for memory usage and performance
Development
Common commands:
Contributing
Contributions, bug reports, and suggestions are welcome. For substantial changes, please open an issue first to discuss the design.
Before submitting a pull request, please make sure formatting, linting, tests, and doctests all pass.
License
This project is licensed under the MIT License.
See LICENSE for details.