tensorlogic-infer
Engine-agnostic execution traits, optimization utilities, and planning API for TensorLogic.
Overview
tensorlogic-infer provides the abstract execution interface and comprehensive optimization infrastructure for TensorLogic backends. This crate defines traits that backends must implement, along with powerful utilities for optimization, scheduling, profiling, and memory management.
Key Components
Core Execution Traits
- TlExecutor: Basic forward execution of compiled graphs
- TlAutodiff: Forward/backward pass for automatic differentiation
- TlEagerAutodiff: Eager mode autodiff with dynamic graph building
- TlBatchExecutor: Efficient batch execution with parallel support
- TlStreamingExecutor: Streaming execution for large datasets
- TlCompilableExecutor: Ahead-of-time graph compilation support
- TlJitExecutor: Just-In-Time compilation with hot path detection
- TlDistributedExecutor: Multi-device distributed execution
- TlRecoverableExecutor: Execution with error recovery and checkpointing
- TlCapabilities: Backend capability queries (devices, dtypes, features)
- TlProfiledExecutor: Execution profiling and performance analysis
Optimization Infrastructure
- GraphOptimizer: Fusion detection, dead node elimination, redundancy analysis
- FusionPlanner: Planning and validation of operation fusion
- Scheduler: Execution scheduling (sequential, parallel, cost-based)
- PlacementOptimizer: Multi-device placement and coordination
- GraphCompiler: AOT graph compilation with multiple optimization levels
- CompilationCache: Caching of compiled graphs to avoid recompilation
- MemoryEstimator: Memory usage estimation and lifetime analysis
- ShapeInferenceContext: Tensor shape inference for optimization
Runtime Utilities
- TensorCache: Result caching with LRU/FIFO/LFU eviction
- MemoryPool: Tensor memory pooling for allocation reuse
- ExecutionStrategy: Complete strategy configuration
- ExecutionContext: State management with lifecycle hooks
- GraphValidator: Graph validation and diagnostics
Testing & Development Tools
- BackendTestAdapter: Comprehensive test templates for backend validation
- GradientChecker: Numerical gradient checking for autodiff verification
- PerfRegression: Performance regression testing with baseline comparison
- Variable & EagerTape: Eager mode execution with gradient tracking
Quick Start
use ;
use Scirs2Exec;
use EinsumGraph;
// Create executor
let mut executor = new;
// Forward pass
let outputs = executor.forward?;
// Backward pass
executor.backward?;
let param_grads = executor.get_gradients?;
Core Traits
TlExecutor
Basic execution interface for forward passes:
TlAutodiff
Automatic differentiation support:
TlBatchExecutor
Efficient batch execution with parallel support:
TlStreamingExecutor
Streaming execution for large datasets:
Streaming Modes:
use ;
// Fixed chunk size
let config = new
.with_prefetch
.with_checkpointing;
// Dynamic chunk sizing based on memory
let config = new;
// Adaptive chunking based on performance
let config = new;
TlCapabilities
Query backend capabilities:
// Example usage
let caps = executor.capabilities;
println!;
println!;
println!;
TlProfiledExecutor
Execution profiling and performance analysis:
// Example usage
executor.enable_profiling;
executor.execute?;
let profile = executor.get_profile_data;
for in &profile.op_profiles
TlJitExecutor
Just-In-Time compilation with hot path detection and adaptive optimization:
// Example usage
use ;
let config = default
.with_hot_path_threshold
.with_max_cache_size;
let outputs = executor.execute_jit?;
let stats = executor.get_jit_stats;
println!;
println!;
JIT Features:
- Hot Path Detection: Automatically identifies frequently executed code paths
- Adaptive Optimization: Progressively optimizes based on runtime profiling
- Graph Specialization: Specializes graphs for observed tensor shapes
- Intelligent Caching: LRU-based cache for compiled graphs
TlDistributedExecutor
Multi-device distributed execution with data/model/pipeline parallelism:
// Example usage - Data Parallelism
use ;
let devices = vec!;
let config = new
.with_strategy;
let outputs = executor.execute_distributed?;
let stats = executor.get_distributed_stats;
println!;
println!;
println!;
Distributed Parallelism Strategies:
Data Parallelism: Replicate model across devices, split data
DataParallel
Model Parallelism: Split model across devices
ModelParallel
Pipeline Parallelism: Split model into stages
PipelineParallel
Hybrid Parallelism: Combine multiple strategies
Hybrid
TlRecoverableExecutor
Execution with error recovery, checkpointing, and fault tolerance:
// Example usage
use ;
let config = default
.with_strategy
.with_retry_policy
.with_checkpointing;
match executor.execute_with_recovery?
Recovery Strategies:
- RetryWithBackoff: Exponential backoff retry
- Checkpoint: Periodic checkpointing with restart
- FallbackExecution: Fall back to alternative execution path
- GracefulDegradation: Continue with reduced functionality
Zero-Copy Tensor Operations
Efficient memory-safe tensor views and slicing without data duplication:
use ;
// Create a tensor view
let view = new;
// Check properties
println!;
println!;
// Ergonomic view builder
let view = new
.range_dim // Slice dimension 0
.index_dim // Index dimension 1
.with_offset
.build;
// Compose views (create view of a view)
let composed = view1.compose?;
// Slice specifications
let specs = vec!;
Async Execution
Non-blocking execution with async/await support (feature-gated):
use ;
// Enable async feature in Cargo.toml
// [dependencies]
// tensorlogic-infer = { version = "*", features = ["async"] }
// Async execution
async
// Async batch processing
let batch_outputs = executor.execute_batch_async.await?;
// Load-balanced executor pool
let pool = new;
// Pool automatically distributes work
let output = pool.execute.await?;
let stats = pool.stats;
println!;
println!;
Enhanced Diagnostics
Rich error messages with helpful suggestions and context:
use ;
// Create diagnostic with context
let diag = error
.with_code
.with_context
.with_suggestion
.with_location;
println!;
// Diagnostic collector
let mut collector = new;
collector.add;
collector.add;
if collector.has_errors
Graph Compilation
TlCompilableExecutor
Ahead-of-time graph compilation with multiple optimization levels:
// Example usage
use ;
let config = default
.with_optimization_level
.with_fusion_enabled
.with_constant_folding;
// Compile once
let compiled = executor.compile_graph?;
// Execute multiple times with different inputs
let outputs1 = executor.execute_compiled?;
let outputs2 = executor.execute_compiled?;
Optimization Levels:
- None: No optimization, fastest compilation
- Basic: Dead code elimination only
- Standard: DCE + common subexpression elimination
- Aggressive: All optimizations + fusion planning
Optimization Utilities
GraphOptimizer
Analyze and optimize computation graphs:
use ;
let optimizer = new;
let result: OptimizationResult = optimizer.analyze;
println!;
println!;
println!;
FusionPlanner
Plan operation fusion:
use ;
let planner = new;
let opportunities = planner.find_fusion_opportunities;
for opp in &opportunities
Scheduler
Execution scheduling with multiple strategies:
use ;
let scheduler = new;
let schedule = scheduler.schedule?;
println!;
println!;
Scheduling Strategies:
Sequential: Simple topological orderParallel: Maximize parallelism across independent nodesCostBased: Balance parallelism with execution cost
PlacementOptimizer
Multi-device placement optimization:
use ;
let devices = vec!;
let optimizer = new;
let plan = optimizer.optimize?;
for in &plan.node_placements
Memory Management
TensorCache: Cache computation results
use ;
let mut cache = new; // 1000 MB limit
cache.insert;
if let Some = cache.get
MemoryPool: Reuse tensor allocations
use MemoryPool;
let mut pool = new;
// Allocate or reuse
let tensor = pool.allocate?;
// Return to pool
pool.deallocate;
let stats = pool.stats;
println!;
ExecutionStrategy
Configure complete execution strategy:
use ;
let strategy = ExecutionStrategy ;
let optimizer = new;
let optimized = optimizer.optimize_for_throughput;
ExecutionContext
Manage execution state with lifecycle hooks:
use ;
let mut context = new;
context.add_hook;
context.notify;
context.notify;
context.notify;
Validation and Analysis
GraphValidator
Validate computation graphs:
use GraphValidator;
let validator = new;
let result = validator.validate;
if !result.is_valid
MemoryEstimator
Estimate memory usage:
use MemoryEstimator;
let estimator = new;
let estimate = estimator.estimate;
println!;
println!;
ShapeInferenceContext
Infer tensor shapes:
use ShapeInferenceContext;
let mut ctx = new;
ctx.set_input_shape;
let inferred = ctx.infer_shapes?;
for in &inferred
Debugging Tools
ExecutionTracer
Record and analyze execution flow:
use ExecutionTracer;
let mut tracer = new;
tracer.enable;
tracer.start_trace;
// Execute operations...
let handle = tracer.record_operation_start;
// ... operation execution ...
tracer.record_operation_end;
// Get trace
let trace = tracer.get_trace;
let summary = trace.summary;
println!;
println!;
// Find slowest operations
let slowest = trace.slowest_operations;
for entry in slowest
TensorInspector
Examine intermediate tensor values:
use ;
let mut inspector = new;
inspector.enable;
inspector.watch; // Watch specific tensor
// Record statistics
let stats = new
.with_statistics;
inspector.record_stats;
// Check for numerical issues
let problematic = inspector.find_problematic_tensors;
for tensor in problematic
BreakpointManager
Pause execution for debugging:
use ;
let mut breakpoints = new;
breakpoints.enable;
// Add various breakpoint types
breakpoints.add_node_breakpoint;
breakpoints.add_operation_breakpoint;
breakpoints.add_numerical_issue_breakpoint;
breakpoints.add_time_threshold_breakpoint; // 5ms
// Check during execution
if let Some = breakpoints.should_break
ExecutionRecorder
Full execution recording for replay:
use ExecutionRecorder;
let mut recorder = new;
recorder.enable;
// All debugging features enabled
recorder.tracer.start_trace;
recorder.inspector.watch;
recorder.breakpoints.add_node_breakpoint;
// Generate comprehensive report
let report = recorder.generate_report;
println!;
Advanced Profiling
TimelineProfiler
Create detailed execution timelines:
use ;
let mut profiler = new;
let hook = new;
// Attach to context
context.add_hook;
// Execute
executor.execute?;
// Analyze timeline
let entries = profiler.entries;
for entry in entries
BottleneckAnalyzer
Identify performance bottlenecks:
use BottleneckAnalyzer;
let analyzer = new;
let report = analyzer.analyze;
println!;
for bottleneck in &report.bottlenecks
println!;
for rec in &report.recommendations
PerformanceComparison
Compare execution strategies:
use PerformanceComparison;
let baseline = from_profile;
let comparison = new;
println!;
println!;
Testing Support
DummyExecutor
Minimal executor for testing:
use DummyExecutor;
let executor = new;
let outputs = executor.execute?;
// Returns empty outputs for testing
Examples
Basic Execution
use TlExecutor;
use Scirs2Exec;
use HashMap;
let executor = new;
let mut inputs = new;
inputs.insert;
let outputs = executor.execute?;
Batch Processing
use TlBatchExecutor;
let batch_inputs = vec!;
let result = executor.execute_batch_parallel?;
println!;
println!;
Streaming Large Datasets
use ;
let config = new.with_prefetch;
let results = executor.execute_stream?;
for result in results
Training with Autodiff
use TlAutodiff;
// Forward pass
let outputs = executor.forward?;
// Compute loss gradients
let loss_grads = compute_loss_gradients;
// Backward pass
executor.backward?;
// Get parameter gradients
let grads = executor.get_gradients?;
// Update parameters
for in grads
Architecture
tensorlogic-infer
├── Core Traits
│ ├── TlExecutor (basic execution)
│ ├── TlAutodiff (training with gradients)
│ ├── TlEagerAutodiff (eager mode autodiff)
│ ├── TlAsyncExecutor (async/await execution) [feature = "async"]
│ ├── TlAsyncBatchExecutor (async batching) [feature = "async"]
│ ├── TlAsyncStreamExecutor (async streaming) [feature = "async"]
│ ├── TlBatchExecutor (batch processing)
│ ├── TlStreamingExecutor (streaming for large datasets)
│ ├── TlCompilableExecutor (AOT graph compilation)
│ ├── TlJitExecutor (JIT compilation)
│ ├── TlDistributedExecutor (multi-device)
│ ├── TlRecoverableExecutor (error recovery)
│ ├── TlCapabilities (backend queries)
│ └── TlProfiledExecutor (profiling & analysis)
├── Compilation & Optimization
│ ├── GraphCompiler (AOT compilation)
│ ├── CompilationCache (compiled graph caching)
│ ├── JitCompiler (runtime compilation)
│ ├── JitCache (JIT-specific caching)
│ ├── HotPathDetector (hot path identification)
│ ├── AdaptiveOptimizer (adaptive optimization)
│ ├── GraphOptimizer (fusion, DCE, redundancy)
│ ├── FusionPlanner (operation fusion)
│ ├── Scheduler (execution ordering)
│ ├── FusionOptimizer (advanced kernel fusion)
│ ├── RewriteEngine (pattern-based graph transformations)
│ ├── ProfilingOptimizer (profiling-guided optimization)
│ ├── CacheOptimizer (memory hierarchy aware optimization)
│ └── PlacementOptimizer (device placement)
├── Distributed Execution
│ ├── DistributedExecutor (multi-device coordinator)
│ ├── DataParallelCoordinator (data parallelism)
│ ├── ModelParallelCoordinator (model parallelism)
│ ├── PipelineParallelCoordinator (pipeline parallelism)
│ └── CommunicationBackend (device communication)
├── Runtime & Memory
│ ├── TensorCache (result caching)
│ ├── MemoryPool (allocation pooling)
│ ├── TensorView (zero-copy views)
│ ├── ViewBuilder (ergonomic view API)
│ ├── WorkspacePool (memory workspace management)
│ ├── ExecutionStrategy (strategy config)
│ ├── ExecutionContext (state management)
│ ├── AsyncExecutorPool (async load balancing) [feature = "async"]
│ ├── CheckpointManager (checkpointing)
│ └── StreamProcessor (streaming processing)
├── Advanced Execution
│ ├── WorkStealingScheduler (work-stealing parallel scheduler)
│ ├── AutoParallelizer (automatic parallelization)
│ ├── SpeculativeExecutor (speculative execution)
│ ├── DynamicBatcher (adaptive request batching)
│ ├── MultiModelCoordinator (ensemble & multi-model)
│ └── LearnedOptimizer (ML-based optimization)
├── Precision & Sparsity
│ ├── Quantizer (INT8/INT4/FP8/Binary quantization)
│ ├── MixedPrecisionConfig (FP16/BF16/FP8 training)
│ ├── LossScaler (automatic loss scaling)
│ └── SparseTensor (CSR/CSC/COO sparse formats)
├── Analysis & Validation
│ ├── GraphValidator (graph validation)
│ ├── MemoryEstimator (memory estimation)
│ ├── ShapeInferenceContext (shape inference)
│ └── BottleneckAnalyzer (performance analysis)
├── Debugging & Profiling
│ ├── ExecutionTracer (execution recording)
│ ├── TensorInspector (tensor inspection)
│ ├── BreakpointManager (execution breakpoints)
│ ├── ExecutionRecorder (full history recording)
│ ├── TimelineProfiler (timeline visualization)
│ └── Visualization (DOT, JSON, GraphML export)
├── Enhanced Diagnostics
│ ├── Diagnostic (rich error messages)
│ ├── DiagnosticCollector (error aggregation)
│ ├── ShapeMismatchDiagnostic (shape errors)
│ ├── MemoryDiagnostic (memory issues)
│ ├── PerformanceDiagnostic (performance warnings)
│ └── SourceLocation (error tracking)
├── Inference Utilities
│ ├── MemoCache (LRU/LFU/FIFO expression memoization) [v0.1.17]
│ ├── MemoKey (SHA-256 structural hash key for TLExpr)
│ ├── MemoCacheBuilder (builder API with pre-warming)
│ ├── ConstraintNetwork (AC-3 arc-consistency propagation) [v0.1.19]
│ ├── CspSolver (backtracking CSP with MRV/DegreeHeuristic)
│ ├── CausalGraph (d-separation, backdoor/frontdoor criteria) [v0.1.19]
│ ├── AteBackdoor / AteInstrumentalVariable (ATE estimators)
│ └── MetropolisHastings / HamiltonianMonteCarlo (MCMC samplers) [v0.1.21]
└── Testing Support
├── DummyExecutor (test executor)
├── BackendTestAdapter (backend test templates)
├── GradientChecker (numerical gradient checking)
└── PerfRegression (performance regression testing)
Expression Memoization (v0.1.17)
MemoCache provides a generic LRU/LFU/FIFO memoization store keyed by MemoKey (structural SHA-256 hash of a TLExpr):
use ;
// Build a cache with LRU eviction and capacity 512
let mut cache = new
.capacity
.policy
.build;
let key = from_expr;
match cache.get
let stats = cache.stats;
println!;
ExprMemoCache is a pre-configured MemoCache<TLExpr, EinsumGraph> specialisation for the most common caching pattern.
Constraint Propagation (v0.1.19)
ConstraintNetwork provides AC-3 arc-consistency and CspSolver for full backtracking constraint satisfaction:
use ;
let mut net = new;
net.add_variable;
net.add_variable;
net.add_constraint;
// AC-3 arc-consistency
let reduced = net.propagate;
println!;
// Full CSP solving with MRV heuristic
let solver = new;
let result = solver.solve;
println!;
Causal Inference (v0.1.19)
CausalGraph supports d-separation, backdoor/frontdoor criteria, do-calculus interventions, and ATE estimation:
use ;
let mut g = new;
g.add_edge;
g.add_edge;
g.add_edge;
// d-separation check
let sep = g.d_separated;
println!;
// Backdoor criterion
let valid = g.satisfies_backdoor_criterion;
println!;
// ATE estimation under backdoor adjustment
let data = new;
let ate = g.ate_backdoor;
println!;
MCMC Sampling (v0.1.21)
MetropolisHastings and HamiltonianMonteCarlo provide MCMC samplers with diagnostic utilities:
use ;
// Metropolis-Hastings with Gaussian proposal
let mh = new;
let samples = mh.sample;
// Hamiltonian Monte Carlo (leapfrog + finite-diff gradients)
let hmc = new;
let samples = hmc.sample;
// Diagnostics
let ess = effective_sample_size;
let r_hat = gelman_rubin;
println!;
Integration with Other Crates
tensorlogic-scirs-backend: Reference implementation using SciRS2
use Scirs2Exec;
let executor = new;
tensorlogic-train: Training infrastructure
use ;
let trainer = new;
tensorlogic-compiler: Compile TLExpr to EinsumGraph
use compile;
let graph = compile?;
let outputs = executor.execute?;
Performance Considerations
Optimization Checklist
- Enable fusion for consecutive operations
- Use batch execution for multiple inputs
- Enable memory pooling to reduce allocations
- Use streaming for large datasets that don't fit in memory
- Profile execution to identify bottlenecks
- Optimize placement for multi-device execution
- Cache results for repeated computations
Benchmarking
Testing
# Run all tests
# Run with output
# Run specific test
Test Coverage: 909 tests covering all traits and utilities (100% passing)
Contributing
See CONTRIBUTING.md for guidelines.
License
Apache-2.0
Status: Stable (v0.1.0)
Last Updated: 2026-04-06
Tests: 909 passing (100%)
Code: 74 files, ~26,000 lines
Completeness: 100%
Dependencies: Random number generation via scirs2_core::random (no direct rand dependency)
Part of: TensorLogic Ecosystem