tensorlogic-scirs-backend
Production-Ready SciRS2-Powered Tensor Execution Backend for TensorLogic
Overview
Production-ready execution backend that runs EinsumGraph computations using SciRS2 (Scientific Computing in Rust v2) for high-performance CPU/SIMD tensor operations.
Input: EinsumGraph from tensorlogic-compiler
Output: Computed tensor values with full autodiff support
Quick Start
use Scirs2Exec;
use TlAutodiff;
use compile_to_einsum;
use ;
// Define a rule: knows(x, y)
let rule = pred;
// Compile to execution graph
let graph = compile_to_einsum?;
// Create executor and provide input tensor
let mut executor = new;
let knows_matrix = from_vec?;
executor.add_tensor;
// Execute forward pass
let result = executor.forward?;
// Backward pass for training
let grad_out = ones?;
let mut grads = new;
grads.insert;
let input_grads = executor.backward?;
Key Features
Execution Engine
- Real Execution: Full implementation of forward pass with all operations
- Autodiff: Production-ready backward pass with gradient computation
- Einsum Operations: Matrix multiplication, tensor contractions via scirs2-linalg
- Element-wise Ops: Unary (ReLU, Sigmoid, Tanh, OneMinus, Abs, Neg, Exp, Log, Sqrt, Square, Clip) and Binary (Add, Sub, Mul, Div, Comparisons)
- Reductions: Sum, Max, Min, Mean, Product over specified axes
- Logical Ops: AND, OR (Max/ProbSum), NAND, NOR, XOR, FORALL
Performance
- Graph Optimization: Dead code elimination, CSE, constant folding, operation fusion
- Memory Planning: Liveness analysis, peak memory estimation, reuse detection
- In-Place Operations: 24 operations with zero-allocation execution
- Parallel Execution: Multi-threaded graph execution with Rayon (requires
parallelfeature) - Memory Pooling: Shape-based tensor reuse with statistics tracking
- SIMD Support: Vectorized operations via feature flags
- Batch Execution: Parallel processing for multiple inputs
Reliability
- Error Handling: Comprehensive error types (ShapeMismatch, Numerical, Device, etc.)
- Execution Tracing: Multi-level debugging (Error/Warn/Info/Debug/Trace)
- Numerical Stability: Fallback mechanisms for NaN/Inf handling
- Shape Validation: Runtime shape inference and verification
- Gradient Checking: Numeric verification for autodiff correctness
Advanced Features
- Quantization: INT8/INT4/INT2 quantization (PTQ and QAT modes)
- Fuzzy Logic: Soft/fuzzy logic operations with temperature control (soft_and, soft_or, soft_not, soft_imply, FuzzyLogic struct)
- Loss Functions: 7 differentiable loss functions (MSE, BCE, CrossEntropy, Focal, Huber, KLDiv, CosineEmbedding) with gradient support
- Geometric Deep Learning: GCN layer, graph Laplacian variants, SO(3) rotations, spherical harmonics up to degree 2
- Signal Processing: STFT/iSTFT, DCT/iDCT, DFT/iDFT, six window types, FIR filter, Mel filterbank
- GPU Readiness: CUDA device detection and GPU readiness assessment
- Custom Operations: User-defined operation trait (CustomOp, OpRegistry)
- Graph Optimizer: Constant folding, CSE, DCE, algebraic simplification
- Metrics Collection: Per-operation timing, memory tracking, throughput measurement
- Memory Profiler: Allocation tracking with configurable profiling
- Checkpoint/Resume: Full training state save/load with JSON serialization
Testing
- 669 Tests: All passing with comprehensive coverage
- Optimization Tests: DCE, CSE, and memory planning
- In-Place Tests: Zero-allocation operations
- Checkpoint Tests: Save/load/restore functionality
- Property-Based: proptest tests for mathematical properties
- Gradient Tests: Numeric gradient checking verifies autodiff accuracy
- Integration Tests: End-to-end TLExpr → Graph → Execution
- Parallel Tests: Multi-threaded execution
- Device Tests: CUDA device detection and management
Architecture
EinsumGraph (from compiler)
↓
Scirs2Exec::forward()
↓
For each EinsumNode (topological order):
- Einsum → scirs2_linalg::einsum() [tensor contraction]
- ElemUnary → ReLU/Sigmoid/OneMinus
- ElemBinary → Add/Sub/Mul/Div/Comparisons
- Reduce → Sum/Max/Min/Mean/Product over axes
↓
TensorOutput (scirs2-core ArrayD<f64>)
↓
Scirs2Exec::backward() [optional, for training]
↓
Gradients (for each input tensor)
Supported Operations
1. Einsum (Tensor Contraction)
// Matrix multiplication: C = AB
// Compiled as einsum("ik,kj->ij", A, B)
let a = from_vec?;
let b = from_vec?;
// Result via graph execution: 2x2 matrix
2. Unary Operations
// ReLU: max(0, x)
// Sigmoid: 1 / (1 + exp(-x))
// OneMinus: 1 - x
// Gradient support:
// - ReLU: grad * (input > 0)
// - Sigmoid: grad * sigmoid(x) * (1 - sigmoid(x))
3. Binary Operations
// Arithmetic: Add, Subtract, Multiply, Divide
// Comparisons: Eq, Lt, Gt, Lte, Gte (return 0.0 or 1.0)
// Logical: AND (multiply), OR (max or prob_sum), XOR, NAND, NOR
// All with proper gradient computation
4. Reductions
// Sum, Max, Min, Mean, Product over specified axes
// With gradient broadcasting back to original shape
// Example: Sum over axis 1
// Input: [3, 4] → Output: [3]
// Gradient: [3] → broadcasted to [3, 4] (all ones)
Graph Optimization
The backend includes production-ready graph optimization passes that significantly improve performance and reduce memory usage.
Optimization Configuration
use ;
// Aggressive optimizations (all enabled)
let config = aggressive;
// Conservative optimizations (only safe passes)
let config = conservative;
// No optimizations
let config = none;
// Custom configuration
let config = OptimizationConfig ;
Compile and Optimize
use CompiledGraph;
// Automatic optimization with defaults
let compiled = compile;
// Custom optimization
let config = aggressive;
let compiled = compile_with_config;
// Access optimization statistics
let stats = compiled.stats;
println!;
println!;
println!;
println!;
println!;
// Execute the optimized graph
let result = executor.forward?;
Optimization Passes
-
Dead Code Elimination (DCE)
- Removes unused tensors and operations
- Backward liveness analysis from outputs
- Typical savings: 10-30% of operations
-
Common Subexpression Elimination (CSE)
- Detects and deduplicates identical subgraphs
- Hash-based node comparison
- Typical savings: 5-15% of operations
-
Constant Folding
- Evaluates constant expressions at compile time
- Aggressive propagation through operations
- Reduces runtime computation
-
Operation Fusion
- Combines element-wise operations
- Reduces intermediate allocations
- 2-3x speedup for operation chains
-
Layout Optimization
- Optimizes tensor memory layouts
- Improves cache locality
- Better SIMD utilization
Memory Planning
The compiler performs liveness analysis to plan memory allocation:
if let Some = compiled.memory_plan
Benefits:
- Predicts peak memory usage
- Identifies 30-50% reuse opportunities
- Enables pre-allocation strategies
In-Place Operations
Execute operations in-place to eliminate memory allocations and improve performance.
Basic Usage
use ;
let mut executor = new;
let mut tensor = /* ... */;
// Check if operation supports in-place execution
if can_execute_inplace
// Binary operations (modifies lhs in-place)
let mut lhs = /* ... */;
let rhs = /* ... */;
executor.execute_inplace_binary?;
// Scalar operations
executor.execute_inplace_scalar?;
Supported Operations
Unary Operations (11):
- Activation:
relu,sigmoid,tanh - Arithmetic:
abs,neg,exp,log,sqrt,square - Other:
oneminus,clip
Binary Operations (6):
add,subtract,multiply,divide,min,max
Scalar Operations (7):
add_scalar,sub_scalar,mul_scalar,div_scalarpow,clamp_min,clamp_max
Statistics and Monitoring
// Get execution statistics
let stats = executor.statistics;
println!;
println!;
println!;
println!;
// Output: "Memory saved: 2.50 MB"
// Reset statistics
executor.reset_stats;
Aliasing Safety
The executor tracks tensor aliasing to prevent unsafe in-place operations:
let mut executor = new;
// Mark tensor as aliased (shared ownership)
executor.mark_aliased;
// Check safety
if executor.can_execute_inplace else
// Clear aliasing information when ownership is released
executor.clear_aliasing;
Performance Benefits:
- 50-70% memory reduction for element-wise operations
- Zero allocations for in-place execution
- Better cache locality with modified tensors
Checkpoint/Resume
Save and restore executor state during training for mid-training checkpoints, recovery from failures, and incremental compilation.
Basic Usage
use ;
let mut executor = new;
// ... training loop ...
// Save checkpoint at iteration 100
let checkpoint = from_executor?;
checkpoint.save?;
// Later, restore from checkpoint
let checkpoint = load?;
let mut executor = checkpoint.restore?;
Checkpoint Configurations
// Training checkpoint (includes forward tape for gradients)
let config = for_training;
// Inference checkpoint (compressed, no tape)
let config = for_inference;
// Incremental checkpoint (only changed tensors)
let config = incremental;
// Custom configuration
let config = CheckpointConfig ;
let checkpoint = from_executor_with_config?;
Checkpoint Manager
For managing multiple checkpoints with automatic cleanup:
use CheckpointManager;
// Create manager
let mut manager = new?;
manager.set_max_checkpoints; // Keep last 5 checkpoints
// Save checkpoints during training
for iteration in 0..100
// Load the latest checkpoint
let checkpoint = manager.load_latest?;
let mut executor = checkpoint.restore?;
// List all checkpoints
for path in manager.list_checkpoints?
Advanced Features
Error Handling
use ;
// Comprehensive error types
match result
Execution Tracing
use ;
// Enable detailed tracing
let mut tracer = new;
// Operations are automatically traced
// Access trace events
for event in tracer.events
// Get statistics
let stats = tracer.stats;
println!;
println!;
Numerical Stability
use ;
// Configure fallback behavior
let config = permissive
.with_nan_replacement
.with_inf_replacement;
// Sanitize tensors before operations
let clean_tensor = sanitize_tensor?;
// Safe operations
use ;
let result = safe_div; // Avoids division by zero
Memory Pooling
use Scirs2Exec;
// Enable memory pooling
let mut executor = new;
executor.enable_pooling;
// Check pooling statistics
let stats = executor.pool_stats;
println!;
Gradient Verification
use ;
// Verify gradient correctness
let config = default
.with_epsilon
.with_rtol
.with_atol;
let report = check_gradients?;
if report.all_passed else
Parallel Execution
Requires: parallel feature flag
Multi-threaded execution automatically detects independent operations and executes them in parallel using Rayon.
[]
= { = "0.1", = ["parallel"] }
Basic Usage
use ParallelScirs2Exec;
use TlAutodiff;
// Create parallel executor
let mut executor = new;
// Optional: Configure thread pool
executor.set_num_threads;
// Add input tensors
executor.add_tensor;
executor.add_tensor;
// Execute with automatic parallelization
let result = executor.forward?;
// Check parallelization statistics
if let Some = executor.execution_stats
Advanced Configuration
use ;
// Custom configuration
let config = ParallelConfig ;
let mut executor = with_config;
// Execute as normal
let result = executor.forward?;
Backend Features
CPU Backend (Default)
[]
= "0.1"
SIMD Backend (Faster)
[]
= { = "0.1", = ["simd"] }
Enables vectorized operations for element-wise ops and reductions.
Parallel + SIMD (Best Performance)
[]
= { = "0.1", = ["parallel", "simd"] }
Combines multi-threaded execution with SIMD vectorization for maximum performance.
GPU Backend (Future)
[]
= { = "0.1", = ["gpu"] }
Note: CUDA device detection is already available. The backend can detect NVIDIA GPUs using nvidia-smi and report device information (name, memory, compute capability). Full GPU execution support will be added when scirs2-core gains GPU features.
Device Management
use ;
use ;
// Query available devices (automatically detects CUDA via nvidia-smi)
let manager = new;
println!;
// Check for GPU availability
if manager.has_gpu
// Detailed CUDA device detection
if is_cuda_available
// Select a specific device
let device = cuda; // CUDA GPU 0
let device = cpu; // CPU
let device = metal; // Apple Metal
// Check if device is available
if manager.is_available
Supported Device Types:
- CPU: Always available, default
- CUDA: NVIDIA GPUs (detection ready, execution planned)
- Metal: Apple GPUs (future)
- Vulkan: Cross-platform compute (future)
- ROCm: AMD GPUs (future)
Precision Control
use ;
// Different precision modes
let config = f32; // 32-bit (faster, less memory)
let config = f64; // 64-bit (more accurate, default)
let config = mixed_precision; // Mixed 16/32-bit
// Query precision properties
println!;
println!;
Quantization
use ;
use ;
// Calibrate for post-training quantization
let params = calibrate_quantization?;
// Quantization-aware training configuration
let qat_config = QatConfig ;
SciRS2 Integration
This crate strictly adheres to the SciRS2 integration policy:
// Correct: Use SciRS2
use ;
use array;
use einsum;
// Wrong: Never import these directly
use Array2; // Never
use thread_rng; // Never
use Complex64; // Never
All tensor operations, linear algebra, and future autograd features use SciRS2.
Testing
# Run all tests
# Run with SIMD
# Run with parallel execution
# Run property tests
# Run benchmarks
# Run parallel benchmarks
Test Coverage
669 tests, all passing:
| Module | Tests |
|---|---|
| autodiff, executor, ops | Core execution and gradient computation |
| parallel_executor | Multi-threaded execution (8 tests) |
| memory_pool | Tensor reuse and pooling (6 tests) |
| dependency_analyzer | Graph analysis for parallelization (8 tests) |
| gradient_ops | Advanced gradient estimators (11 tests) |
| error, tracing, fallback | Reliability features (26 tests) |
| execution_mode, device, precision | Backend features (38 tests) |
| custom_ops | User-defined operations (16 tests) |
| graph_optimizer | Optimization passes (13 tests) |
| metrics | MetricsCollector, AtomicMetrics (19 tests) |
| checkpoint | Save/load/restore (11 tests) |
| inplace_ops | Zero-allocation operations (16 tests) |
| memory_profiler | Allocation tracking (8 tests) |
| quantization | INT8/INT4/INT2 quantization (10 tests) |
| fuzzy_logic | Soft/fuzzy logic operations (21 tests) |
| gpu_readiness | GPU assessment and recommendations (8 tests) |
| cuda_detect | CUDA device detection (8 tests) |
| batch_executor | Batch processing (5 tests) |
| fusion | Operation fusion analysis (6 tests) |
| shape_inference | Shape validation (8 tests) |
| capabilities | Runtime capability detection (4 tests) |
| tensor_loss | 7 differentiable loss functions (included in total) |
| geometric_ops | GCN, Laplacian variants, SO(3), spherical harmonics (included in total) |
| signal_ops | STFT, DCT, DFT, window functions, Mel filterbank (included in total) |
| proptests | Property-based mathematical property verification (included in total) |
Property-Based Testing
Uses proptest to verify mathematical properties:
- Addition commutativity:
a + b = b + a - Multiplication associativity:
(a * b) * c = a * (b * c) - Distributivity:
a * (b + c) = a*b + a*c - Sum linearity:
sum(a*x + b*y) = a*sum(x) + b*sum(y) - Sigmoid range:
0 <= sigmoid(x) <= 1 - Identity/inverse properties
Integration Example
Full example with training:
use compile_to_einsum;
use Scirs2Exec;
use TlAutodiff;
use ;
// Define rule: knows(x,y) ∧ knows(y,z) → knows(x,z) (transitivity)
let knows_xy = pred;
let knows_yz = pred;
let premise = and;
// Compile to graph
let graph = compile_to_einsum?;
// Setup executor with input data
let mut executor = new;
let knows_matrix = from_vec?;
executor.add_tensor;
// Forward pass
let result = executor.forward?;
println!;
// Backward pass for training
let loss_grad = ones?;
let mut grads = new;
grads.insert;
let input_grads = executor.backward?;
// Access gradients
for in input_grads.tensors.iter
API Documentation
Key public types:
Scirs2Exec: Main executor implementingTlAutodifftraitTlBackendError: Comprehensive error typesExecutionTracer: Debug tracing with multiple levelsFallbackConfig: Numerical stability configurationForwardTape: Stores intermediate values for backward passParallelBatchExecutor: Batch processing with parallelizationProfiledScirs2Exec: Performance profiling wrapperMetricsCollector: Per-operation timing and memory trackingGraphOptimizer: Pre-execution optimization passesInplaceExecutor: Zero-allocation in-place operationsCheckpointManager: Training state save/restore
See full API docs for details.
Limitations and Future Work
Current limitations:
- No GPU execution: CPU/SIMD only (CUDA detection ready; execution planned via scirs2 GPU features)
- No JIT compilation: Eager and graph modes supported; JIT planned
- No distributed execution: Single-device only
See TODO.md for the complete roadmap.
Contributing
When contributing:
- Follow SciRS2 integration policy strictly
- Add tests for all new features (maintain 100% pass rate)
- Use
cargo clippy -- -D warnings(zero warnings policy) - Format code with
cargo fmt - Keep files under 2000 lines (use SplitRS if needed)
- Update TODO.md with task status
License
Apache-2.0
Status: Stable (v0.1.0) Last Updated: 2026-04-06 Tests: 669/669 passing (100%) Part of: TensorLogic Ecosystem