CausalTensor - A Flexible Tensor for Dynamic Data

The CausalTensor provides a flexible, multi-dimensional array (tensor) backed by a single, contiguous Vec<T>. It is designed for efficient numerical computations, featuring a stride-based memory layout that supports broadcasting for element-wise binary operations. It offers a comprehensive API for shape manipulation, element access, and common reduction operations like sum and mean, making it a versatile tool for causal modeling and other data-intensive tasks.

📚 Docs

Examples

To run the examples, use cargo run --example <example_name>.

Applicative Causal Tensor

cargo run --example applicative_causal_tensor

Basic Causal Tensor
```
cargo run --example causal_tensor
```

Effect System Causal Tensor

cargo run --example effect_system_causal_tensor

Einstein Summation Causal Tensor

cargo run --example ein_sum_causal_tensor

Functor Causal Tensor

cargo run --example functor_causal_tensor

Usage

CausalTensor is straightforward to use. You create it from a flat vector of data and a vector defining its shape.

use deep_causality_tensor::CausalTensor;

fn main() {
    // 1. Create a 2x3 tensor.
    let data = vec![1, 2, 3, 4, 5, 6];
    let shape = vec![2, 3];
    let tensor = CausalTensor::new(data, shape).unwrap();
    println!("Original Tensor: {}", tensor);

    // 2. Get an element
    let element = tensor.get(&[1, 2]).unwrap();
    assert_eq!(*element, 6);
    println!("Element at [1, 2]: {}", element);

    // 3. Reshape the tensor
    let reshaped = tensor.reshape(&[3, 2]).unwrap();
    assert_eq!(reshaped.shape(), &[3, 2]);
    println!("Reshaped to 3x2: {}", reshaped);

    // 4. Perform tensor-scalar addition
    let added = &tensor + 10;
    assert_eq!(added.as_slice(), &[11, 12, 13, 14, 15, 16]);
    println!("Tensor + 10: {}", added);

    // 5. Perform tensor-tensor addition with broadcasting
    let t1 = CausalTensor::new(vec![1, 2, 3, 4, 5, 6], vec![2, 3]).unwrap();
    // A [1, 3] tensor...
    let t2 = CausalTensor::new(vec![10, 20, 30], vec![1, 3]).unwrap();
    // ...is broadcasted across the rows of the [2, 3] tensor.
    let result = (&t1 + &t2).unwrap();
    assert_eq!(result.as_slice(), &[11, 22, 33, 14, 25, 36]);
    println!("Tensor-Tensor Add with Broadcast: {}", result);

    // 6. Sum all elements in the tensor (full reduction)
    let sum = tensor.sum_axes(&[]).unwrap();
    assert_eq!(sum.as_slice(), &[21]);
    println!("Sum of all elements: {}", sum);
}

Einstein Sum (ein_sum)

The ein_sum function provides a powerful and flexible way to perform various tensor operations, including matrix multiplication, dot products, and more, by constructing an Abstract Syntax Tree (AST) of operations.

use deep_causality_tensor::CausalTensor;
use deep_causality_tensor::types::causal_tensor::op_tensor_ein_sum::EinSumOp;

fn main() {
    // Example: Matrix Multiplication using ein_sum
    let lhs_data = vec![1.0, 2.0, 3.0, 4.0];
    let lhs_tensor = CausalTensor::new(lhs_data, vec![2, 2]).unwrap();

    let rhs_data = vec![5.0, 6.0, 7.0, 8.0];
    let rhs_tensor = CausalTensor::new(rhs_data, vec![2, 2]).unwrap();

    // Construct the AST for matrix multiplication
    let mat_mul_ast = EinSumOp::mat_mul(lhs_tensor, rhs_tensor);

    // Execute the Einstein summation
    let result = CausalTensor::ein_sum(&mat_mul_ast).unwrap();

    println!("Result of Matrix Multiplication:\n{:?}", result);
    // Expected: CausalTensor { data: [19.0, 22.0, 43.0, 50.0], shape: [2, 2], strides: [2, 1] }

    // Example: Dot Product
    let vec1_data = vec![1.0, 2.0, 3.0];
    let vec1_shape = vec![3];
    let vec1_tensor = CausalTensor::new(vec1_data, vec1_shape).unwrap();

    let vec2_data = vec![4.0, 5.0, 6.0];
    let vec2_shape = vec![3];
    let vec2_tensor = CausalTensor::new(vec2_data, vec2_shape).unwrap();

    // Execute the Einstein summation for dot product 
    let result_dot_prod = CausalTensor::ein_sum(&EinSumOp::dot_prod(vec1_tensor, vec2_tensor)).unwrap();
    println!("Result of Dot Product:\n{:?}", result_dot_prod);
}

Functional Composition

Causal Tensor implements a Higher Kinded Type via the deep_causality_haft crate as Witness Type. When imported, the CausalTensorWitness type allows monadic composition and abstract type programming. For example, one can write generic functions that uniformly process tensors and other types:

use deep_causality_haft::{Functor, HKT, OptionWitness, ResultWitness};
use deep_causality_tensor::{CausalTensor, CausalTensorWitness};

fn triple_value<F>(m_a: F::Type<i32>) -> F::Type<i32>
where
    F: Functor<F> + HKT,
{
    F::fmap(m_a, |x| x * 3)
}

fn main() {
    println!("--- Functor Example: Tripling values in different containers ---");

    // Using triple_value with Option
    let opt = Some(5);
    println!("Original Option: {:?}", opt);
    let proc_opt = triple_value::<OptionWitness>(opt);
    println!("Doubled Option: {:?}", proc_opt);
    assert_eq!(proc_opt, Some(15));

    // Using triple_value with Result
    let res = Ok(5);
    println!("Original Result: {:?}", res);
    let proc_res = triple_value::<ResultWitness<i32>>(res);
    println!("Doubled Result: {:?}", proc_res);
    assert_eq!(proc_res, Ok(15));

    // Using triple_value with CausalTensor
    let tensor = CausalTensor::new(vec![1, 2, 3], vec![3]).unwrap();
    println!("Original CausalTensor: {:?}", tensor);
    let proc_tensor = triple_value::<CausalTensorWitness>(tensor);
    println!("Doubled CausalTensor: {:?}", proc_tensor);
    assert_eq!(proc_tensor.data(), &[3, 6, 9]);
}

Functional composition of HKS tensors works best via an effect system that captures side effects and provides detailed errors and logs for each processing step. In the example below, Tensors are composed and the container MyMonadEffect3 capture the final tensor value, optional errors, and detailed logs from each processing step.

    // ... Truncated  

// 4. Chain Operations using Monad::bind
println!("Processing steps...");
let final_effect = MyMonadEffect3::bind(initial_effect, step1);
let final_effect = MyMonadEffect3::bind(final_effect, step2);
let final_effect = MyMonadEffect3::bind(final_effect, step3);

println!();
println!("--- Final Result ---");
println!("Final CausalTensor: {:?}", final_effect.value);
println!("Error: {:?}", final_effect.error);
println!("Logs: {:?}", final_effect.logs);

For complex data processing pipelines, these information are invaluable for debugging and optimization. Also, in case more detailed information are required i.e. processing time for each step, then an Effect Monad of arity 4 or 5 can be used to capture additional fields at each step.

GPU Acceleration (Apple Silicon)

CausalTensor supports optional GPU acceleration via MLX on Apple Silicon (M1/M2/M3). Enable with the mlx feature flag.

Prerequisites

MLX requires Xcode and the Metal Toolchain. Run the following setup steps:

# 1. Run Xcode first-launch setup (installs command-line tools)
xcodebuild -runFirstLaunch

# 2. Download the Metal Toolchain for GPU shader compilation
xcodebuild -downloadComponent MetalToolchain

# 3. Build with MLX feature enabled
RUSTFLAGS='-C target-cpu=native' cargo build --release -p deep_causality_tensor --features mlx

# 4. Run MLX tests (must use single thread due to Metal command buffer serialization)
cargo test -p deep_causality_tensor --features mlx mlx -- --test-threads=1

Note: MLX tests must run with --test-threads=1 due to Metal command buffer serialization requirements. Parallel test execution causes Metal assertion failures.

Enabling MLX

# Cargo.toml
[dependencies]
deep_causality_tensor = { version = "0.2", features = ["mlx"] }

Precision vs Bulk Compute: f32 vs f64

Apple's Metal GPU does not support f64. All GPU operations run in f32. This creates a natural separation:

Workload Type	Precision	Use
Precision workloads	f64	Accumulation over large N, small differences of large numbers, clock drift (10⁻¹⁵ scale)
Bulk compute	f32	Matrix multiplication, eigendecomposition, neural network inference

Rule of thumb: If your smallest meaningful quantity ε and largest M satisfy log₁₀(M/ε) > 7, use f64.

MlxCausalTensor

For GPU-accelerated operations, use MlxCausalTensor which stores data directly in MLX's unified memory:

use deep_causality_tensor::{CausalTensor, MlxCausalTensor};

// Scenario 1: Direct GPU construction (no conversion overhead)
let mlx_a = MlxCausalTensor::new_f32(vec![1.0, 2.0, 3.0, 4.0], vec![2, 2]) ?;
let mlx_b = MlxCausalTensor::new_f32(vec![5.0, 6.0, 7.0, 8.0], vec![2, 2]) ?;
let result = mlx_a.matmul( & mlx_b) ?;
let output = result.to_causal_tensor() ?;  // Back to CausalTensor<f32>

// Scenario 2: Bridge from f64 physics simulation (with downcast)
let physics_data: CausalTensor<f64> = /* precision-critical simulation */;
let mlx_tensor = MlxCausalTensor::from_causal_tensor_f64( & physics_data) ?;
// GPU-accelerated matmul runs in f32
let accelerated = mlx_tensor.matmul( & other) ?;

Native Operations & EinSum

The MLX backend provides fully native GPU configurations for:

ein_sum: Native GPU execution via recursive AST interpretation. No CPU roundtrips.
Linear Algebra: matmul, svd, qr, cholesky_decomposition, solve_least_squares_cholsky, inverse.
Tensor Ops: slice, permute, reshape, broadcast, etc.

Recommended Pattern for Physics

Separate precision-critical storage from bulk compute:

// 1. Store raw data in f64 for precision
let clock_drifts: CausalTensor<f64> = load_satellite_data();  // femtosecond precision

// 2. Downcast for GPU-accelerated matrix ops
let covariance = MlxCausalTensor::from_causal_tensor_f64( & clock_drifts) ?;
let eigenvalues = covariance.eigendecomposition() ?;

// 3. Upcast results if precision needed for next stage
let eigenvalues_f64: Vec<f64> = eigenvalues.to_causal_tensor() ?
.data().iter().map( | & x| x as f64).collect();

Note: The copy overhead (Rust → MLX → Rust) means MLX is most beneficial for large tensors (N > 10,000) or complex O(N³) operations where compute time dominates data transfer time. The Native ein_sum implementation ensures that complex contraction chains remain entirely on the GPU.

Performance

CPU Benchmarks

The following benchmarks were run on a CausalTensor of a small 100x100 tensor (10,000 f64 elements).

Operation	Time	Notes
`tensor_get`	~2.31 ns	Accessing a single element.
`tensor_reshape`	~2.46 µs	Metadata only, but clones data in the test.
`tensor_scalar_add`	~4.95 µs	Element-wise addition with a scalar.
`tensor_tensor_add_broadcast`	~46.67 µs	Element-wise addition with broadcasting.
`tensor_sum_full_reduction`	~10.56 µs	Summing all 10,000 elements of the tensor.

CPU / GPU (MLX) Benchmarks

Operation	Size	CPU Time	GPU Time	Speedup
MatMul	128×128	1.50 ms	0.17 ms	8.8x
MatMul	512×512	134.6 ms	0.22 ms	612x
MatMul	1024×1024	1,087 ms	0.41 ms	2,651x

Hardware:

Architecture: ARM64 (Apple Silicon, M3 Max)
OS: macOS 26.2 | Kernel Version 25.2.0

For detailed benchmarks and a comparision to MLX / GPU, see the BENCHMARK file.

Technical Implementation

Strides

The core of CausalTensor is its stride-based memory layout. For a given shape (e.g., [d1, d2, d3]), the strides represent the number of elements to skip in the flat data vector to move one step along a particular dimension. For a row-major layout, the strides would be [d2*d3, d3, 1]. This allows the tensor to calculate the flat index for any multi-dimensional index [i, j, k] with a simple dot product: i*strides[0] + j*strides[1] + k*strides[2].

Broadcasting

Binary operations support broadcasting, which follows rules similar to those in libraries like NumPy. When operating on two tensors, CausalTensor compares their shapes dimension by dimension (from right to left). Two dimensions are compatible if:

They are equal.
One of them is 1.

The smaller tensor's data is conceptually "stretched" or repeated along the dimensions where its size is 1 to match the larger tensor's shape, without actually copying the data. The optimized binary_op implementation achieves this by manipulating how it calculates the flat index for each tensor inside the computation loop.

API Overview

The CausalTensor API is designed to be comprehensive and intuitive:

Constructor: CausalTensor::new(data: Vec<T>, shape: Vec<usize>)
Inspectors: shape(), num_dim(), len(), is_empty(), as_slice()
Indexing: get(), get_mut()
Shape Manipulation: reshape(), ravel()
Reduction Operations: sum_axes(), mean_axes(), arg_sort()
Arithmetic: Overloaded +, -, *, / operators for both tensor-scalar and tensor-tensor operations.

👨‍💻👩‍💻 Contribution

Contributions are welcomed especially related to documentation, example code, and fixes. If unsure where to start, just open an issue and ask.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in deep_causality by you, shall be licensed under the MIT licence, without any additional terms or conditions.

📜 Licence

This project is licensed under the MIT license.

👮️ Security

For details about security, please read the security policy.

💻 Author

Marvin Hansen.
Github GPG key ID: 369D5A0B210D39BC
GPG Fingerprint: 4B18 F7B2 04B9 7A72 967E 663E 369D 5A0B 210D 39BC

deep_causality_tensor 0.2.2