CausalTensor - A Flexible Tensor for Dynamic Data
The CausalTensor provides a flexible, multi-dimensional array (tensor) backed by a single, contiguous Vec<T>. It is designed for efficient numerical computations, featuring a stride-based memory layout that supports broadcasting for
element-wise binary operations. It offers a comprehensive API for shape manipulation, element access, and common reduction operations like sum and mean, making it a versatile tool for causal modeling and other data-intensive
tasks.
๐ Docs
Usage
CausalTensor is straightforward to use. You create it from a flat vector of data and a vector defining its shape.
use CausalTensor;
Performance
The following benchmarks were run on a CausalTensor of size 100x100 (10,000 f64 elements).
| Operation | Time | Notes |
|---|---|---|
tensor_get |
~2.31 ns | Accessing a single element. |
tensor_reshape |
~2.46 ยตs | Metadata only, but clones data in the test. |
tensor_scalar_add |
~4.95 ยตs | Element-wise addition with a scalar. |
tensor_tensor_add_broadcast |
~46.67 ยตs | Element-wise addition with broadcasting. |
tensor_sum_full_reduction |
~10.56 ยตs | Summing all 10,000 elements of the tensor. |
Key Observations
- Element Access (
get): Access is extremely fast, demonstrating the efficiency of the stride-based index calculation. - Shape Manipulation (
reshape): This operation is very fast as it only adjusts metadata (shape and strides) and clones the underlying data vector. - Arithmetic Operations: Performance is excellent. The optimized
binary_opfunction provides efficient broadcasting for tensor-tensor operations, avoiding allocations in hot loops.
Technical Details
- Sample size: 10 measurements per benchmark
- All benchmarks were run with random access patterns to simulate real-world usage
Hardware & OS
- Architecture: ARM64 (Apple Silicon, M3 Max)
- OS: macOS 15.1
Technical Implementation
Strides
The core of CausalTensor is its stride-based memory layout. For a given shape (e.g., [d1, d2, d3]), the strides represent the number of elements to skip in the flat data vector to move one step along a particular dimension. For a row-major layout, the strides would be [d2*d3, d3, 1]. This allows the tensor to calculate the flat index for any multi-dimensional index [i, j, k] with a simple dot product: i*strides[0] + j*strides[1] + k*strides[2].
Broadcasting
Binary operations support broadcasting, which follows rules similar to those in libraries like NumPy. When operating on two tensors, CausalTensor compares their shapes dimension by dimension (from right to left). Two dimensions are compatible if:
- They are equal.
- One of them is 1.
The smaller tensor's data is conceptually "stretched" or repeated along the dimensions where its size is 1 to match the larger tensor's shape, without actually copying the data. The optimized binary_op implementation achieves this by manipulating how it calculates the flat index for each tensor inside the computation loop.
API Overview
The CausalTensor API is designed to be comprehensive and intuitive:
- Constructor:
CausalTensor::new(data: Vec<T>, shape: Vec<usize>) - Inspectors:
shape(),num_dim(),len(),is_empty(),as_slice() - Indexing:
get(),get_mut() - Shape Manipulation:
reshape(),ravel() - Reduction Operations:
sum_axes(),mean_axes(),arg_sort() - Arithmetic: Overloaded
+,-,*,/operators for both tensor-scalar and tensor-tensor operations.
๐จโ๐ป๐ฉโ๐ป Contribution
Contributions are welcomed especially related to documentation, example code, and fixes. If unsure where to start, just open an issue and ask.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in deep_causality by you, shall be licensed under the MIT licence, without any additional terms or conditions.
๐ Licence
This project is licensed under the MIT license.
๐ฎ๏ธ Security
For details about security, please read the security policy.
๐ป Author
- Marvin Hansen.
- Github GPG key ID: 369D5A0B210D39BC
- GPG Fingerprint: 4B18 F7B2 04B9 7A72 967E 663E 369D 5A0B 210D 39BC