One More Einsum Contraction Order (OMECO)
Tensor network contraction order optimization in Rust.
Ported from OMEinsumContractionOrders.jl.
Documentation
📖 Read the full documentation (mdBook)
- Getting Started Guide
- Algorithm Comparison
- GPU Optimization
- PyTorch Integration
- API Reference
- Rust API Docs
Overview
When contracting multiple tensors together, the order of contractions significantly affects computational cost. Finding the optimal contraction order is NP-complete, but good heuristics can find near-optimal solutions quickly.
This library provides:
- GreedyMethod: Fast O(n² log n) greedy algorithm for contraction order
- TreeSA: Simulated annealing for higher quality contraction orders
- TreeSASlicer: Automatic slicing optimization to reduce memory usage
Installation
Python
Rust
Add to your Cargo.toml:
[]
= "0.1"
Python Quick Start
# Matrix chain: A[0,1] × B[1,2] × C[2,3] → D[0,3]
=
=
=
# 1) Optimize contraction order
=
=
# 2) Slice to reduce memory (automatic optimization)
=
=
=
# Use with PyTorch (see examples/pytorch_tensor_network_example.py)
= # Convert to dict for traversal
Rust Quick Start
Two core features are exposed in the quick start below: optimizing contraction orders and slicing for lower peak memory.
use ;
use HashMap;
// Matrix chain: A[i,j] * B[j,k] * C[k,l] -> D[i,l]
let code = new;
// Define tensor dimensions
let mut sizes = new;
sizes.insert;
sizes.insert;
sizes.insert;
sizes.insert;
// 1) Optimize contraction order
let optimized = optimize_code.unwrap;
let complexity = contraction_complexity;
println!;
println!;
// 2) Slice to reduce memory (automatic optimization)
let slicer = fast.with_sc_target;
let sliced = slice_code.unwrap;
let sliced_comp = sliced_complexity;
println!;
Additional Resources
- Examples: See
examples/for complete working examples - Troubleshooting: See the troubleshooting guide
- API Reference: Python & Rust API docs
Algorithms
GreedyMethod
Iteratively contracts the tensor pair with minimum cost:
use ;
let code = new;
let sizes = uniform_size_dict;
// Default: deterministic greedy
let result = optimize_code;
// Stochastic greedy with temperature
let stochastic = new;
let result = optimize_code;
Parameters:
alpha: Balances output size vs input size reduction (0.0 to 1.0)temperature: Enables Boltzmann sampling (0.0 = deterministic)
TreeSA
Simulated annealing with local tree mutations:
use ;
let code = new;
let sizes = uniform_size_dict;
// Fast configuration (fewer iterations)
let result = optimize_code;
// Default configuration (higher quality)
let result = optimize_code;
// Custom configuration with space constraint
let custom = default
.with_sc_target // Target space complexity
.with_ntrials; // More parallel trials
let result = optimize_code;
Parameters:
betas: Inverse temperature schedulentrials: Number of parallel trials (uses rayon)niters: Iterations per temperature levelscore: Scoring function with complexity weights
Complexity Metrics
Three complexity metrics are computed (all in log2 scale):
use ;
let code = new;
let mut sizes = new;
sizes.insert;
sizes.insert;
sizes.insert;
let optimized = optimize_code.unwrap;
let c = contraction_complexity;
println!;
println!;
println!;
Custom Scoring
Control the trade-off between time and space complexity:
use ;
// Optimize primarily for time (ignore space)
let time_score = time_optimized;
// Optimize for space with target of 2^15 elements
let space_score = space_optimized;
// Custom weights
let custom_score = new;
let config = TreeSA ;
Working with NestedEinsum
The optimized result is a NestedEinsum representing the contraction tree:
use ;
let code = new;
let sizes = uniform_size_dict;
let tree = optimize_code.unwrap;
// Inspect the tree
println!;
println!;
println!;
// Get contraction order (leaf indices)
let order = tree.leaf_indices;
println!;
Sliced Contractions
For memory-constrained scenarios, use SlicedEinsum:
use ;
// Assume we have an optimized tree
let leaf0 = leaf;
let leaf1 = leaf;
let eins = new;
let tree = node;
// Slice over index 'j' to reduce memory
let sliced = new;
println!;
Integer Labels
For programmatic use, integer labels are often more convenient:
use ;
use HashMap;
// Using usize labels instead of char
let code: = new;
let mut sizes = new;
sizes.insert;
sizes.insert;
sizes.insert;
sizes.insert;
let result = optimize_code;
Performance Tips
- Use TreeSA::fast() for quick results - Fewer iterations, single trial
- Increase ntrials for large problems - More parallel exploration
- Set sc_target based on available memory - Constrains space complexity
- Use GreedyMethod for very large networks - O(n² log n) vs O(n² × iterations)
Benchmarks
We benchmark TreeSA performance by comparing the Rust implementation against the original Julia implementation (OMEinsumContractionOrders.jl).
Configuration: ntrials=1, niters=50-100, βs=0.01:0.05:15.0
TreeSA Results
| Problem | Tensors | Indices | Rust tc | Julia tc | Rust (ms) | Julia (ms) |
|---|---|---|---|---|---|---|
| chain_10 | 10 | 11 | 23.10 | 23.10 | 21.0 | 40.6 |
| chain_20 | 20 | 21 | 24.18 | 24.18 | 46.1 | 61.2 |
| grid_4x4 | 16 | 24 | 9.18 | 9.18 | 170.4 | 144.6 |
| grid_5x5 | 25 | 40 | 10.96 | 10.96 | 208.1 | 292.6 |
| grid_6x6 | 36 | 60 | 12.15 | 12.15 | 457.8 | 471.7 |
| petersen | 10 | 15 | 8.49 | 8.49 | 41.1 | 37.0 |
| reg3_50 | 50 | 74 | 12.97 | 12.96 | 529.3 | 608.5 |
| reg3_100 | 100 | 148 | 17.23 | 17.24 | 1214.7 | 1471.9 |
| reg3_250 | 250 | 372 | 46.03 | 48.01 | 3702.2 | 5148.0 |
Key observations:
- Rust TreeSA produces equivalent tc values as Julia on all tested problems (within stochastic variation)
- Rust is 1.2-1.4x faster than Julia on large graphs (reg3_100, reg3_250) using bitset optimization
- The
reg3_250benchmark shows TreeSA reduces tc from ~70 (greedy) to ~46, a 34% improvement
To run the benchmarks yourself:
# Run all benchmarks
# Or individually:
License
MIT