tenrso-kernels
Tensor kernel operations: Khatri-Rao, Kronecker, Hadamard, n-mode products, and MTTKRP.
Overview
tenrso-kernels provides high-performance tensor operation kernels that are fundamental building blocks for tensor decompositions and contractions:
- Khatri-Rao product - Column-wise Kronecker product
- Kronecker product - Tensor product of matrices
- Hadamard product - Element-wise (pointwise) multiplication
- N-mode product - Tensor-matrix/tensor-tensor products (TTM/TTT)
- MTTKRP - Matricized Tensor Times Khatri-Rao Product
All kernels are optimized for performance with SIMD acceleration and parallel execution.
Features
- Cache-friendly implementations
- SIMD-accelerated operations (via scirs2_core)
- Parallel execution with Rayon (optional feature)
- Generic over scalar types (f32, f64)
- Minimal allocations
- Correctness property tests
Usage
Add to your Cargo.toml:
[]
= "0.1"
# With parallel execution
= { = "0.1", = ["parallel"] }
Khatri-Rao Product (TODO: M1)
use khatri_rao;
use Array2;
let a = from_shape_fn;
let b = from_shape_fn;
// Column-wise Kronecker product: (100*50) × 10
let kr = khatri_rao;
assert_eq!;
Kronecker Product (TODO: M1)
use kronecker;
use Array2;
let a = from_elem;
let b = from_elem;
// Tensor product: (2*4) × (3*5)
let kron = kronecker;
assert_eq!;
Hadamard Product (TODO: M1)
use hadamard;
use Array;
let a = from_elem;
let b = from_elem;
// Element-wise multiplication
let result = hadamard?;
assert_eq!;
N-Mode Product (TODO: M1)
use nmode_product;
use ;
// Tensor: 10 × 20 × 30
let tensor = zeros;
// Matrix: 15 × 20 (contracts along mode 1)
let matrix = zeros;
// Result: 10 × 15 × 30
let result = nmode_product?;
assert_eq!;
MTTKRP (TODO: M1)
use mttkrp;
use ;
// Tensor: 100 × 200 × 300
let tensor = zeros;
// Factor matrices for CP decomposition
let u = zeros; // Mode 1 factors
let v = zeros; // Mode 2 factors
// MTTKRP along mode 0: (100, 64)
let result = mttkrp?;
assert_eq!;
API Reference
Khatri-Rao Product
Computes the column-wise Kronecker product. Output shape: (a.nrows() * b.nrows(), a.ncols()).
Complexity: O(I × J × K) where a is I×K and b is J×K.
Kronecker Product
Computes the tensor product. Output shape: (a.nrows() * b.nrows(), a.ncols() * b.ncols()).
Complexity: O(I × J × K × L) where a is I×K and b is J×L.
Hadamard Product
Element-wise multiplication. Shapes must match exactly.
Complexity: O(N) where N is total elements.
N-Mode Product
Contracts tensor with matrix along specified mode.
Complexity: O(I₀ × ... × Iₙ × J) where tensor is I₀×...×Iₙ, matrix is J×Iₘₒdₑ.
MTTKRP
Matricized Tensor Times Khatri-Rao Product. Core operation for CP-ALS.
Complexity: O(I₀ × ... × Iₙ × R) where tensor is I₀×...×Iₙ, rank is R.
Performance Optimization
SIMD Acceleration
Kernels use scirs2_core SIMD operations when available:
- AVX2 for f32/f64 operations
- Automatic vectorization for element-wise ops
- Cache-friendly memory access patterns
Parallel Execution
Enable parallel feature for multi-threaded execution:
[]
= { = "0.1", = ["parallel"] }
Blocked Operations
Large matrices use blocked algorithms to improve cache locality:
- MTTKRP uses tiled iteration
- Khatri-Rao batches column operations
- N-mode product uses chunked unfolding
Benchmarks
Run performance benchmarks:
# Compare with baseline
Expected performance (16-core CPU):
- Khatri-Rao: > 50 GFLOP/s (1000×1000, rank 64)
- MTTKRP: > 80% of optimal GEMM rate
- N-mode: > 75% of GEMM baseline
Testing
# Unit tests
# Property tests
# With all features
Examples
See examples/ directory:
khatri_rao.rs- Basic usage and benchmarking (TODO)mttkrp_cp.rs- MTTKRP in CP decomposition context (TODO)nmode_tucker.rs- N-mode product for Tucker (TODO)
Feature Flags
default=["parallel"]- Parallel execution enabledparallel- Use Rayon for multi-threading
Dependencies
- tenrso-core - Tensor types
- scirs2-core - Array operations, SIMD
- rayon (optional) - Parallel iteration
- num-traits - Generic numeric traits
Contributing
See ../../CONTRIBUTING.md for development guidelines.
License
Apache-2.0