CausalTensor - A Flexible Tensor for Dynamic Data
The CausalTensor provides a flexible, multi-dimensional array (tensor) backed by a single, contiguous Vec<T>. It is
designed for efficient numerical computations, featuring a stride-based memory layout that supports broadcasting for
element-wise binary operations. It offers a comprehensive API for shape manipulation, element access, and common
reduction operations like sum and mean, making it a versatile tool for causal modeling and other data-intensive
tasks.
📚 Docs
Examples
To run the examples, use cargo run --example <example_name>.
- Applicative Causal Tensor
- Basic Causal Tensor
- Effect System Causal Tensor
- Einstein Summation Causal Tensor
- Functor Causal Tensor
Usage
CausalTensor is straightforward to use. You create it from a flat vector of data and a vector defining its shape.
use CausalTensor;
Einstein Sum (ein_sum)
The ein_sum function provides a powerful and flexible way to perform various tensor operations, including matrix
multiplication, dot products, and more, by constructing an Abstract Syntax Tree (AST) of operations.
use CausalTensor;
use EinSumOp;
Functional Composition
Causal Tensor implements a Higher Kinded Type via the deep_causality_haft crate as Witness Type. When imported, the
CausalTensorWitness type allows monadic composition and abstract type programming. For example, one can write generic
functions that uniformly process tensors and other types:
use ;
use ;
Functional composition of HKS tensors works best via an effect system that captures side effects and provides detailed errors and logs for each processing step. In the example below, Tensors are composed and the container MyMonadEffect3 capture the final tensor value, optional errors, and detailed logs from each processing step.
// ... Truncated
// 4. Chain Operations using Monad::bind
println!;
let final_effect = bind;
let final_effect = bind;
let final_effect = bind;
println!;
println!;
println!;
println!;
println!;
For complex data processing pipelines, these information are invaluable for debugging and optimization. Also, in case more detailed information are required i.e. processing time for each step, then an Effect Monad of arity 4 or 5 can be used to capture additional fields at each step.
GPU Acceleration (Apple Silicon)
CausalTensor supports optional GPU acceleration via MLX on Apple Silicon (M1/M2/M3). Enable with the mlx feature flag.
Prerequisites
MLX requires Xcode and the Metal Toolchain. Run the following setup steps:
# 1. Run Xcode first-launch setup (installs command-line tools)
# 2. Download the Metal Toolchain for GPU shader compilation
# 3. Build with MLX feature enabled
RUSTFLAGS='-C target-cpu=native'
# 4. Run MLX tests (must use single thread due to Metal command buffer serialization)
Note: MLX tests must run with
--test-threads=1due to Metal command buffer serialization requirements. Parallel test execution causes Metal assertion failures.
Enabling MLX
# Cargo.toml
[]
= { = "0.2", = ["mlx"] }
Precision vs Bulk Compute: f32 vs f64
Apple's Metal GPU does not support f64. All GPU operations run in f32. This creates a natural separation:
| Workload Type | Precision | Use |
|---|---|---|
| Precision workloads | f64 | Accumulation over large N, small differences of large numbers, clock drift (10⁻¹⁵ scale) |
| Bulk compute | f32 | Matrix multiplication, eigendecomposition, neural network inference |
Rule of thumb: If your smallest meaningful quantity ε and largest M satisfy log₁₀(M/ε) > 7, use f64.
MlxCausalTensor
For GPU-accelerated operations, use MlxCausalTensor which stores data directly in MLX's unified memory:
use ;
// Scenario 1: Direct GPU construction (no conversion overhead)
let mlx_a = new_f32 ?;
let mlx_b = new_f32 ?;
let result = mlx_a.matmul ?;
let output = result.to_causal_tensor ?; // Back to CausalTensor<f32>
// Scenario 2: Bridge from f64 physics simulation (with downcast)
let physics_data: = /* precision-critical simulation */;
let mlx_tensor = from_causal_tensor_f64 ?;
// GPU-accelerated matmul runs in f32
let accelerated = mlx_tensor.matmul ?;
Native Operations & EinSum
The MLX backend provides fully native GPU configurations for:
ein_sum: Native GPU execution via recursive AST interpretation. No CPU roundtrips.- Linear Algebra:
matmul,svd,qr,cholesky_decomposition,solve_least_squares_cholsky,inverse. - Tensor Ops:
slice,permute,reshape,broadcast, etc.
Recommended Pattern for Physics
Separate precision-critical storage from bulk compute:
// 1. Store raw data in f64 for precision
let clock_drifts: = load_satellite_data; // femtosecond precision
// 2. Downcast for GPU-accelerated matrix ops
let covariance = from_causal_tensor_f64 ?;
let eigenvalues = covariance.eigendecomposition ?;
// 3. Upcast results if precision needed for next stage
let eigenvalues_f64: = eigenvalues.to_causal_tensor ?
.data.iter.map.collect;
Note: The copy overhead (Rust → MLX → Rust) means MLX is most beneficial for large tensors (N > 10,000) or complex O(N³) operations where compute time dominates data transfer time. The Native
ein_sumimplementation ensures that complex contraction chains remain entirely on the GPU.
Performance
CPU Benchmarks
The following benchmarks were run on a CausalTensor of a small 100x100 tensor (10,000 f64 elements).
| Operation | Time | Notes |
|---|---|---|
tensor_get |
~2.31 ns | Accessing a single element. |
tensor_reshape |
~2.46 µs | Metadata only, but clones data in the test. |
tensor_scalar_add |
~4.95 µs | Element-wise addition with a scalar. |
tensor_tensor_add_broadcast |
~46.67 µs | Element-wise addition with broadcasting. |
tensor_sum_full_reduction |
~10.56 µs | Summing all 10,000 elements of the tensor. |
CPU / GPU (MLX) Benchmarks
| Operation | Size | CPU Time | GPU Time | Speedup |
|---|---|---|---|---|
| MatMul | 128×128 | 1.50 ms | 0.17 ms | 8.8x |
| MatMul | 512×512 | 134.6 ms | 0.22 ms | 612x |
| MatMul | 1024×1024 | 1,087 ms | 0.41 ms | 2,651x |
Hardware:
- Architecture: ARM64 (Apple Silicon, M3 Max)
- OS: macOS 26.2 | Kernel Version 25.2.0
For detailed benchmarks and a comparision to MLX / GPU, see the BENCHMARK file.
Technical Implementation
Strides
The core of CausalTensor is its stride-based memory layout. For a given shape (e.g., [d1, d2, d3]), the strides
represent the number of elements to skip in the flat data vector to move one step along a particular dimension. For a
row-major layout, the strides would be [d2*d3, d3, 1]. This allows the tensor to calculate the flat index for any
multi-dimensional index [i, j, k] with a simple dot product: i*strides[0] + j*strides[1] + k*strides[2].
Broadcasting
Binary operations support broadcasting, which follows rules similar to those in libraries like NumPy. When operating on
two tensors, CausalTensor compares their shapes dimension by dimension (from right to left). Two dimensions are
compatible if:
- They are equal.
- One of them is 1.
The smaller tensor's data is conceptually "stretched" or repeated along the dimensions where its size is 1 to match the
larger tensor's shape, without actually copying the data. The optimized binary_op implementation achieves this by
manipulating how it calculates the flat index for each tensor inside the computation loop.
API Overview
The CausalTensor API is designed to be comprehensive and intuitive:
- Constructor:
CausalTensor::new(data: Vec<T>, shape: Vec<usize>) - Inspectors:
shape(),num_dim(),len(),is_empty(),as_slice() - Indexing:
get(),get_mut() - Shape Manipulation:
reshape(),ravel() - Reduction Operations:
sum_axes(),mean_axes(),arg_sort() - Arithmetic: Overloaded
+,-,*,/operators for both tensor-scalar and tensor-tensor operations.
👨💻👩💻 Contribution
Contributions are welcomed especially related to documentation, example code, and fixes. If unsure where to start, just open an issue and ask.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in deep_causality by you, shall be licensed under the MIT licence, without any additional terms or conditions.
📜 Licence
This project is licensed under the MIT license.
👮️ Security
For details about security, please read the security policy.
💻 Author
- Marvin Hansen.
- Github GPG key ID: 369D5A0B210D39BC
- GPG Fingerprint: 4B18 F7B2 04B9 7A72 967E 663E 369D 5A0B 210D 39BC