Crate tenrso_planner

Expand description

§tenrso-planner

Contraction order planning and optimization for TenRSo.

Version: 0.1.0-alpha.2 Tests: 271 passing (264 + 7 ignored) - 100% Status: M4 Complete with advanced enhancements and production features

This crate provides algorithms and tools for finding efficient execution plans for tensor network contractions, particularly those expressed in Einstein summation notation.

§Features

6 Planning Algorithms: Greedy, DP, Beam Search, Simulated Annealing, Genetic Algorithm, Adaptive
Parallel Planning: Ensemble planner runs multiple algorithms concurrently for best quality
ML-Based Calibration: Learn from execution history to improve cost predictions over time
Cost Estimation: FLOPs and memory prediction for dense and sparse tensors
Plan Caching: LRU/LFU/ARC eviction policies for memoization
Execution Simulation: Hardware-specific cost modeling (CPU + 6 GPU models with Tensor Cores)
Quality Tracking: Real-time execution feedback and adaptive tuning
Representation Selection: Automatic choice between dense, sparse, and low-rank formats
Tiling and Blocking: Multi-level cache-aware strategies for efficient memory access
Production Features: Profiling, validation, visualization, serialization
SciRS2-Core Integration: Professional-grade RNG for reproducible stochastic algorithms

§Quick Start

use tenrso_planner::{greedy_planner, EinsumSpec, PlanHints};

// Parse Einstein summation notation
let spec = EinsumSpec::parse("ij,jk->ik").unwrap();

// Define tensor shapes
let shapes = vec![vec![100, 200], vec![200, 300]];

// Create a plan with default hints
let hints = PlanHints::default();
let plan = greedy_planner(&spec, &shapes, &hints).unwrap();

// Inspect plan details
println!("Estimated FLOPs: {:.2e}", plan.estimated_flops);
println!("Peak memory: {} bytes", plan.estimated_memory);

§Planners

TenRSo-Planner provides 6 production-grade planning algorithms:

§1. Adaptive Planner (⭐ Recommended)

The AdaptivePlanner automatically selects the best algorithm based on problem size.

Small networks (n ≤ 5): Uses DP (optimal)
Medium networks (5 < n ≤ 20): Uses Beam Search or DP
Large networks (n > 20): Uses Greedy, SA, or GA based on time budget

Use when: You want optimal results without manual algorithm selection

§2. Greedy Planner

The greedy_planner repeatedly contracts the pair of tensors with minimum cost at each step. Fast (O(n³)) and produces good results for most tensor networks.

Use when: Planning time is critical, many tensors (> 10)

§3. Beam Search Planner

The BeamSearchPlanner explores multiple paths simultaneously. Better quality than greedy while remaining tractable.

Use when: Medium networks (8-20 tensors), want better than greedy quality

§4. Dynamic Programming Planner

The dp_planner uses bitmask DP to find globally optimal contraction order. More expensive (O(3ⁿ) time, O(2ⁿ) space) but guarantees optimality.

Use when: Few tensors (≤ 20), need provably optimal plans

§5. Simulated Annealing Planner

The SimulatedAnnealingPlanner uses stochastic search to escape local minima.

Use when: Large networks (> 20), quality more important than planning speed

§6. Genetic Algorithm Planner

The GeneticAlgorithmPlanner uses evolutionary search with population-based exploration, crossover, and mutation to find high-quality contraction orders.

Use when: Very large networks (> 20), need best quality with enough time budget

§Cost Model

The planner estimates costs using:

FLOPs: Multiply-add operations (2 FLOPs per element)
Memory: Bytes for intermediate results and outputs
Sparsity: Density-based predictions for sparse operations

§Examples

See the examples/ directory for comprehensive usage demonstrations:

basic_matmul.rs: Simple matrix multiplication planning
matrix_chain.rs: Comparing greedy vs DP for chain contractions
planning_hints.rs: Using hints to control optimization

§Performance

Planning overhead is typically negligible compared to execution:

Adaptive: Auto-selects best algorithm (varies)
Greedy: ~1ms for 10 tensors, ~10ms for 100 tensors
Beam Search (k=5): ~5ms for 10 tensors, ~50ms for 100 tensors
DP: ~1ms for 5 tensors, ~100ms for 10 tensors, ~10s for 15 tensors
Simulated Annealing: ~10-100ms (configurable iterations)
Genetic Algorithm: ~50-500ms (depends on population and generations)

§References

Matrix chain multiplication: Cormen et al., “Introduction to Algorithms”
Tensor network contraction: Pfeifer et al., “Faster identification of optimal contraction sequences” (2014)
Einsum optimization: opt_einsum library (Python)

Re-exports§

pub use order::beam_search_planner;
pub use order::dp_planner;
pub use order::genetic_algorithm_planner;
pub use order::greedy_planner;
pub use order::refine_plan;
pub use order::simulated_annealing_planner;
pub use order::AdaptivePlanner;
pub use order::BeamSearchPlanner;
pub use order::DPPlanner;
pub use order::GeneticAlgorithmPlanner;
pub use order::GreedyPlanner;
pub use order::SimulatedAnnealingPlanner;
pub use api::*;
pub use cache::*;
pub use cost::*;
pub use ml_cost::*;
pub use parallel::*;
pub use parser::*;
pub use profiling::*;
pub use quality::*;
pub use repr::*;
pub use simulator::*;
pub use tiling::*;

Modules§

api: Public planner API
cache: Plan caching for memoization and performance optimization
cost: Cost models for tensor contractions
ml_cost: Machine learning-based cost model refinement.
order: Auto-generated module structure
parallel: Parallel and distributed planning strategies.
parser: Einsum specification parser
profiling: Profiling and instrumentation for plan operations
quality: Plan quality tracking and adaptive tuning
repr: Representation selection (dense/sparse/low-rank)
simulator: Plan execution simulation and cost estimation
tiling: Tiling and blocking strategies