tensorlogic-scirs-backend 0.1.0

SciRS2-powered tensor execution backend for TensorLogic
Documentation

tensorlogic-scirs-backend

Production-Ready SciRS2-Powered Tensor Execution Backend for TensorLogic

Crate Documentation Tests Production

Overview

Production-ready execution backend that runs EinsumGraph computations using SciRS2 (Scientific Computing in Rust v2) for high-performance CPU/SIMD tensor operations.

Input: EinsumGraph from tensorlogic-compiler Output: Computed tensor values with full autodiff support

Quick Start

use tensorlogic_scirs_backend::Scirs2Exec;
use tensorlogic_infer::TlAutodiff;
use tensorlogic_compiler::compile_to_einsum;
use tensorlogic_ir::{TLExpr, Term};

// Define a rule: knows(x, y)
let rule = TLExpr::pred("knows", vec![Term::var("x"), Term::var("y")]);

// Compile to execution graph
let graph = compile_to_einsum(&rule)?;

// Create executor and provide input tensor
let mut executor = Scirs2Exec::new();
let knows_matrix = Scirs2Exec::from_vec(
    vec![1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0],
    vec![3, 3]
)?;
executor.add_tensor("knows[ab]", knows_matrix);

// Execute forward pass
let result = executor.forward(&graph)?;

// Backward pass for training
let grad_out = Scirs2Exec::ones(result.shape().to_vec())?;
let mut grads = std::collections::HashMap::new();
grads.insert("output", grad_out);
let input_grads = executor.backward(&graph, grads)?;

Key Features

Execution Engine

  • Real Execution: Full implementation of forward pass with all operations
  • Autodiff: Production-ready backward pass with gradient computation
  • Einsum Operations: Matrix multiplication, tensor contractions via scirs2-linalg
  • Element-wise Ops: Unary (ReLU, Sigmoid, Tanh, OneMinus, Abs, Neg, Exp, Log, Sqrt, Square, Clip) and Binary (Add, Sub, Mul, Div, Comparisons)
  • Reductions: Sum, Max, Min, Mean, Product over specified axes
  • Logical Ops: AND, OR (Max/ProbSum), NAND, NOR, XOR, FORALL

Performance

  • Graph Optimization: Dead code elimination, CSE, constant folding, operation fusion
  • Memory Planning: Liveness analysis, peak memory estimation, reuse detection
  • In-Place Operations: 24 operations with zero-allocation execution
  • Parallel Execution: Multi-threaded graph execution with Rayon (requires parallel feature)
  • Memory Pooling: Shape-based tensor reuse with statistics tracking
  • SIMD Support: Vectorized operations via feature flags
  • Batch Execution: Parallel processing for multiple inputs

Reliability

  • Error Handling: Comprehensive error types (ShapeMismatch, Numerical, Device, etc.)
  • Execution Tracing: Multi-level debugging (Error/Warn/Info/Debug/Trace)
  • Numerical Stability: Fallback mechanisms for NaN/Inf handling
  • Shape Validation: Runtime shape inference and verification
  • Gradient Checking: Numeric verification for autodiff correctness

Advanced Features

  • Quantization: INT8/INT4/INT2 quantization (PTQ and QAT modes)
  • Fuzzy Logic: Soft/fuzzy logic operations with temperature control (soft_and, soft_or, soft_not, soft_imply, FuzzyLogic struct)
  • Loss Functions: 7 differentiable loss functions (MSE, BCE, CrossEntropy, Focal, Huber, KLDiv, CosineEmbedding) with gradient support
  • Geometric Deep Learning: GCN layer, graph Laplacian variants, SO(3) rotations, spherical harmonics up to degree 2
  • Signal Processing: STFT/iSTFT, DCT/iDCT, DFT/iDFT, six window types, FIR filter, Mel filterbank
  • GPU Readiness: CUDA device detection and GPU readiness assessment
  • Custom Operations: User-defined operation trait (CustomOp, OpRegistry)
  • Graph Optimizer: Constant folding, CSE, DCE, algebraic simplification
  • Metrics Collection: Per-operation timing, memory tracking, throughput measurement
  • Memory Profiler: Allocation tracking with configurable profiling
  • Checkpoint/Resume: Full training state save/load with JSON serialization

Testing

  • 669 Tests: All passing with comprehensive coverage
  • Optimization Tests: DCE, CSE, and memory planning
  • In-Place Tests: Zero-allocation operations
  • Checkpoint Tests: Save/load/restore functionality
  • Property-Based: proptest tests for mathematical properties
  • Gradient Tests: Numeric gradient checking verifies autodiff accuracy
  • Integration Tests: End-to-end TLExpr → Graph → Execution
  • Parallel Tests: Multi-threaded execution
  • Device Tests: CUDA device detection and management

Architecture

EinsumGraph (from compiler)
  ↓
Scirs2Exec::forward()
  ↓
For each EinsumNode (topological order):
  - Einsum → scirs2_linalg::einsum() [tensor contraction]
  - ElemUnary → ReLU/Sigmoid/OneMinus
  - ElemBinary → Add/Sub/Mul/Div/Comparisons
  - Reduce → Sum/Max/Min/Mean/Product over axes
  ↓
TensorOutput (scirs2-core ArrayD<f64>)
  ↓
Scirs2Exec::backward() [optional, for training]
  ↓
Gradients (for each input tensor)

Supported Operations

1. Einsum (Tensor Contraction)

// Matrix multiplication: C = AB
// Compiled as einsum("ik,kj->ij", A, B)
let a = Scirs2Exec::from_vec(vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0], vec![2, 3])?;
let b = Scirs2Exec::from_vec(vec![7.0, 8.0, 9.0, 10.0, 11.0, 12.0], vec![3, 2])?;
// Result via graph execution: 2x2 matrix

2. Unary Operations

// ReLU: max(0, x)
// Sigmoid: 1 / (1 + exp(-x))
// OneMinus: 1 - x

// Gradient support:
// - ReLU: grad * (input > 0)
// - Sigmoid: grad * sigmoid(x) * (1 - sigmoid(x))

3. Binary Operations

// Arithmetic: Add, Subtract, Multiply, Divide
// Comparisons: Eq, Lt, Gt, Lte, Gte (return 0.0 or 1.0)
// Logical: AND (multiply), OR (max or prob_sum), XOR, NAND, NOR

// All with proper gradient computation

4. Reductions

// Sum, Max, Min, Mean, Product over specified axes
// With gradient broadcasting back to original shape

// Example: Sum over axis 1
// Input: [3, 4] → Output: [3]
// Gradient: [3] → broadcasted to [3, 4] (all ones)

Graph Optimization

The backend includes production-ready graph optimization passes that significantly improve performance and reduce memory usage.

Optimization Configuration

use tensorlogic_scirs_backend::{CompiledGraph, OptimizationConfig};

// Aggressive optimizations (all enabled)
let config = OptimizationConfig::aggressive();

// Conservative optimizations (only safe passes)
let config = OptimizationConfig::conservative();

// No optimizations
let config = OptimizationConfig::none();

// Custom configuration
let config = OptimizationConfig {
    enable_constant_folding: true,
    enable_fusion: true,
    enable_dce: true,
    enable_cse: true,
    enable_layout_opt: false,
    enable_memory_planning: true,
};

Compile and Optimize

use tensorlogic_scirs_backend::CompiledGraph;

// Automatic optimization with defaults
let compiled = CompiledGraph::compile(graph);

// Custom optimization
let config = OptimizationConfig::aggressive();
let compiled = CompiledGraph::compile_with_config(graph, &config);

// Access optimization statistics
let stats = compiled.stats();
println!("Original ops: {}", stats.original_ops);
println!("Optimized ops: {}", stats.optimized_ops);
println!("Eliminated: {}", stats.eliminated_ops);
println!("Fused: {}", stats.fused_ops);
println!("Compilation time: {:.2}ms", stats.compilation_time_ms);

// Execute the optimized graph
let result = executor.forward(compiled.graph())?;

Optimization Passes

  1. Dead Code Elimination (DCE)

    • Removes unused tensors and operations
    • Backward liveness analysis from outputs
    • Typical savings: 10-30% of operations
  2. Common Subexpression Elimination (CSE)

    • Detects and deduplicates identical subgraphs
    • Hash-based node comparison
    • Typical savings: 5-15% of operations
  3. Constant Folding

    • Evaluates constant expressions at compile time
    • Aggressive propagation through operations
    • Reduces runtime computation
  4. Operation Fusion

    • Combines element-wise operations
    • Reduces intermediate allocations
    • 2-3x speedup for operation chains
  5. Layout Optimization

    • Optimizes tensor memory layouts
    • Improves cache locality
    • Better SIMD utilization

Memory Planning

The compiler performs liveness analysis to plan memory allocation:

if let Some(plan) = compiled.memory_plan {
    println!("Max live tensors: {}", plan.max_live_tensors);
    println!("Peak memory: {} bytes", plan.peak_memory_bytes);
    println!("Reuse opportunities: {}", plan.reuse_opportunities.len());

    // Reuse opportunities are (source, dest) pairs
    for (src, dest) in plan.reuse_opportunities {
        println!("Can reuse tensor {} for tensor {}", src, dest);
    }
}

Benefits:

  • Predicts peak memory usage
  • Identifies 30-50% reuse opportunities
  • Enables pre-allocation strategies

In-Place Operations

Execute operations in-place to eliminate memory allocations and improve performance.

Basic Usage

use tensorlogic_scirs_backend::{InplaceExecutor, can_execute_inplace};

let mut executor = InplaceExecutor::new();
let mut tensor = /* ... */;

// Check if operation supports in-place execution
if can_execute_inplace("relu") {
    executor.execute_inplace_unary("relu", &mut tensor)?;
}

// Binary operations (modifies lhs in-place)
let mut lhs = /* ... */;
let rhs = /* ... */;
executor.execute_inplace_binary("add", &mut lhs, &rhs)?;

// Scalar operations
executor.execute_inplace_scalar("mul", &mut tensor, 2.0)?;

Supported Operations

Unary Operations (11):

  • Activation: relu, sigmoid, tanh
  • Arithmetic: abs, neg, exp, log, sqrt, square
  • Other: oneminus, clip

Binary Operations (6):

  • add, subtract, multiply, divide, min, max

Scalar Operations (7):

  • add_scalar, sub_scalar, mul_scalar, div_scalar
  • pow, clamp_min, clamp_max

Statistics and Monitoring

// Get execution statistics
let stats = executor.statistics();

println!("In-place ops: {}", stats.inplace_ops);
println!("Non-in-place ops: {}", stats.non_inplace_ops);
println!("In-place %: {:.1}%", stats.inplace_percentage());
println!("Memory saved: {}", stats.format_memory_saved());
// Output: "Memory saved: 2.50 MB"

// Reset statistics
executor.reset_stats();

Aliasing Safety

The executor tracks tensor aliasing to prevent unsafe in-place operations:

let mut executor = InplaceExecutor::new();

// Mark tensor as aliased (shared ownership)
executor.mark_aliased(tensor_id);

// Check safety
if executor.can_execute_inplace(tensor_id) {
    // Safe to modify in-place
} else {
    // Must allocate new tensor
}

// Clear aliasing information when ownership is released
executor.clear_aliasing();

Performance Benefits:

  • 50-70% memory reduction for element-wise operations
  • Zero allocations for in-place execution
  • Better cache locality with modified tensors

Checkpoint/Resume

Save and restore executor state during training for mid-training checkpoints, recovery from failures, and incremental compilation.

Basic Usage

use tensorlogic_scirs_backend::{Checkpoint, CheckpointConfig};

let mut executor = Scirs2Exec::new();
// ... training loop ...

// Save checkpoint at iteration 100
let checkpoint = Checkpoint::from_executor(&executor, 100)?;
checkpoint.save("checkpoint_iter_100.json")?;

// Later, restore from checkpoint
let checkpoint = Checkpoint::load("checkpoint_iter_100.json")?;
let mut executor = checkpoint.restore()?;

Checkpoint Configurations

// Training checkpoint (includes forward tape for gradients)
let config = CheckpointConfig::for_training();

// Inference checkpoint (compressed, no tape)
let config = CheckpointConfig::for_inference();

// Incremental checkpoint (only changed tensors)
let config = CheckpointConfig::incremental();

// Custom configuration
let config = CheckpointConfig {
    enable_compression: true,
    include_tape: true,
    verify_checksum: true,
    incremental: false,
};

let checkpoint = Checkpoint::from_executor_with_config(&executor, iteration, &config)?;

Checkpoint Manager

For managing multiple checkpoints with automatic cleanup:

use tensorlogic_scirs_backend::CheckpointManager;

// Create manager
let mut manager = CheckpointManager::new("./checkpoints")?;
manager.set_max_checkpoints(Some(5)); // Keep last 5 checkpoints

// Save checkpoints during training
for iteration in 0..100 {
    // ... training step ...

    if iteration % 10 == 0 {
        let path = manager.save_checkpoint(&executor, iteration)?;
        println!("Saved checkpoint: {:?}", path);
    }
}

// Load the latest checkpoint
let checkpoint = manager.load_latest()?;
let mut executor = checkpoint.restore()?;

// List all checkpoints
for path in manager.list_checkpoints()? {
    println!("Checkpoint: {:?}", path);
}

Advanced Features

Error Handling

use tensorlogic_scirs_backend::{TlBackendError, TlBackendResult};

// Comprehensive error types
match result {
    Err(TlBackendError::ShapeMismatch(err)) => {
        println!("Shape error: {}", err);
    }
    Err(TlBackendError::NumericalError(err)) => {
        println!("Numerical issue: {:?}", err.kind);
    }
    Err(TlBackendError::DeviceError(err)) => {
        println!("Device error: {}", err);
    }
    Ok(value) => { /* success */ }
}

Execution Tracing

use tensorlogic_scirs_backend::{ExecutionTracer, TraceLevel};

// Enable detailed tracing
let mut tracer = ExecutionTracer::new(TraceLevel::Debug);

// Operations are automatically traced
// Access trace events
for event in tracer.events() {
    println!("{}", event);  // Shows operation, duration, inputs/outputs
}

// Get statistics
let stats = tracer.stats();
println!("Total ops: {}", stats.total_operations);
println!("Total time: {:?}", stats.total_duration);

Numerical Stability

use tensorlogic_scirs_backend::{FallbackConfig, sanitize_tensor};

// Configure fallback behavior
let config = FallbackConfig::permissive()
    .with_nan_replacement(0.0)
    .with_inf_replacement(1e10, -1e10);

// Sanitize tensors before operations
let clean_tensor = sanitize_tensor(&input, &config, "my_operation")?;

// Safe operations
use tensorlogic_scirs_backend::fallback::{safe_div, safe_log, safe_sqrt};
let result = safe_div(&a, &b, 1e-10);  // Avoids division by zero

Memory Pooling

use tensorlogic_scirs_backend::Scirs2Exec;

// Enable memory pooling
let mut executor = Scirs2Exec::new();
executor.enable_pooling();

// Check pooling statistics
let stats = executor.pool_stats();
println!("Reuse rate: {:.1}%", stats.reuse_rate * 100.0);

Gradient Verification

use tensorlogic_scirs_backend::gradient_check::{check_gradients, GradientCheckConfig};

// Verify gradient correctness
let config = GradientCheckConfig::default()
    .with_epsilon(1e-5)
    .with_rtol(1e-4)
    .with_atol(1e-6);

let report = check_gradients(&graph, &executor, &config)?;

if report.all_passed {
    println!("All gradients correct!");
} else {
    for result in &report.results {
        println!("{}: max_error = {:.2e}", result.tensor_name, result.max_abs_diff);
    }
}

Parallel Execution

Requires: parallel feature flag

Multi-threaded execution automatically detects independent operations and executes them in parallel using Rayon.

[dependencies]
tensorlogic-scirs-backend = { version = "0.1", features = ["parallel"] }

Basic Usage

use tensorlogic_scirs_backend::ParallelScirs2Exec;
use tensorlogic_infer::TlAutodiff;

// Create parallel executor
let mut executor = ParallelScirs2Exec::new();

// Optional: Configure thread pool
executor.set_num_threads(4);

// Add input tensors
executor.add_tensor("p1", tensor1);
executor.add_tensor("p2", tensor2);

// Execute with automatic parallelization
let result = executor.forward(&graph)?;

// Check parallelization statistics
if let Some(stats) = executor.execution_stats() {
    println!("Parallel ops: {}", stats.parallel_ops);
    println!("Sequential ops: {}", stats.sequential_ops);
    println!("Estimated speedup: {:.2}x", stats.estimated_speedup);
}

Advanced Configuration

use tensorlogic_scirs_backend::{ParallelConfig, ParallelScirs2Exec};

// Custom configuration
let config = ParallelConfig {
    num_threads: Some(8),          // Use 8 threads (None = all cores)
    min_parallel_ops: 3,            // Minimum ops per level for parallelization
    enable_pooling: true,           // Enable memory pooling
};

let mut executor = ParallelScirs2Exec::with_config(config);

// Execute as normal
let result = executor.forward(&graph)?;

Backend Features

CPU Backend (Default)

[dependencies]
tensorlogic-scirs-backend = "0.1"

SIMD Backend (Faster)

[dependencies]
tensorlogic-scirs-backend = { version = "0.1", features = ["simd"] }

Enables vectorized operations for element-wise ops and reductions.

Parallel + SIMD (Best Performance)

[dependencies]
tensorlogic-scirs-backend = { version = "0.1", features = ["parallel", "simd"] }

Combines multi-threaded execution with SIMD vectorization for maximum performance.

GPU Backend (Future)

[dependencies]
tensorlogic-scirs-backend = { version = "0.1", features = ["gpu"] }

Note: CUDA device detection is already available. The backend can detect NVIDIA GPUs using nvidia-smi and report device information (name, memory, compute capability). Full GPU execution support will be added when scirs2-core gains GPU features.

Device Management

use tensorlogic_scirs_backend::{DeviceManager, Device, DeviceType};
use tensorlogic_scirs_backend::{detect_cuda_devices, is_cuda_available};

// Query available devices (automatically detects CUDA via nvidia-smi)
let manager = DeviceManager::new();
println!("Available devices: {:?}", manager.available_devices());

// Check for GPU availability
if manager.has_gpu() {
    println!("GPU devices found: {}", manager.count_devices(DeviceType::Cuda));
}

// Detailed CUDA device detection
if is_cuda_available() {
    let cuda_devices = detect_cuda_devices();
    for device_info in cuda_devices {
        println!("GPU {}: {} ({} MB)",
                 device_info.index,
                 device_info.name,
                 device_info.memory_mb);
        if let Some((major, minor)) = device_info.compute_capability {
            println!("  Compute Capability: {}.{}", major, minor);
        }
    }
}

// Select a specific device
let device = Device::cuda(0); // CUDA GPU 0
let device = Device::cpu();    // CPU
let device = Device::metal();  // Apple Metal

// Check if device is available
if manager.is_available(&device) {
    manager.set_default_device(device)?;
}

Supported Device Types:

  • CPU: Always available, default
  • CUDA: NVIDIA GPUs (detection ready, execution planned)
  • Metal: Apple GPUs (future)
  • Vulkan: Cross-platform compute (future)
  • ROCm: AMD GPUs (future)

Precision Control

use tensorlogic_scirs_backend::{Precision, PrecisionConfig, Scalar};

// Different precision modes
let config = PrecisionConfig::f32();  // 32-bit (faster, less memory)
let config = PrecisionConfig::f64();  // 64-bit (more accurate, default)
let config = PrecisionConfig::mixed_precision(); // Mixed 16/32-bit

// Query precision properties
println!("Precision: {}", Precision::F32);
println!("Memory savings: {:.1}%", Precision::F32.memory_savings() * 100.0);

Quantization

use tensorlogic_scirs_backend::{QuantizationType, QuantizationScheme, QuantizationGranularity};
use tensorlogic_scirs_backend::{calibrate_quantization, QatConfig};

// Calibrate for post-training quantization
let params = calibrate_quantization(&tensor, QuantizationType::Int8, QuantizationScheme::Symmetric)?;

// Quantization-aware training configuration
let qat_config = QatConfig {
    quantization_type: QuantizationType::Int8,
    scheme: QuantizationScheme::Asymmetric,
    granularity: QuantizationGranularity::PerChannel,
    ..Default::default()
};

SciRS2 Integration

This crate strictly adheres to the SciRS2 integration policy:

// Correct: Use SciRS2
use scirs2_core::ndarray::{Array, ArrayD, Axis};
use scirs2_core::array;
use scirs2_linalg::einsum;

// Wrong: Never import these directly
use ndarray::Array2;  // Never
use rand::thread_rng;  // Never
use num_complex::Complex64;  // Never

All tensor operations, linear algebra, and future autograd features use SciRS2.

Testing

# Run all tests
cargo nextest run -p tensorlogic-scirs-backend

# Run with SIMD
cargo nextest run -p tensorlogic-scirs-backend --features simd

# Run with parallel execution
cargo nextest run -p tensorlogic-scirs-backend --features parallel

# Run property tests
cargo test -p tensorlogic-scirs-backend --test proptests

# Run benchmarks
cargo bench -p tensorlogic-scirs-backend

# Run parallel benchmarks
cargo bench -p tensorlogic-scirs-backend --bench parallel_performance --features parallel

Test Coverage

669 tests, all passing:

Module Tests
autodiff, executor, ops Core execution and gradient computation
parallel_executor Multi-threaded execution (8 tests)
memory_pool Tensor reuse and pooling (6 tests)
dependency_analyzer Graph analysis for parallelization (8 tests)
gradient_ops Advanced gradient estimators (11 tests)
error, tracing, fallback Reliability features (26 tests)
execution_mode, device, precision Backend features (38 tests)
custom_ops User-defined operations (16 tests)
graph_optimizer Optimization passes (13 tests)
metrics MetricsCollector, AtomicMetrics (19 tests)
checkpoint Save/load/restore (11 tests)
inplace_ops Zero-allocation operations (16 tests)
memory_profiler Allocation tracking (8 tests)
quantization INT8/INT4/INT2 quantization (10 tests)
fuzzy_logic Soft/fuzzy logic operations (21 tests)
gpu_readiness GPU assessment and recommendations (8 tests)
cuda_detect CUDA device detection (8 tests)
batch_executor Batch processing (5 tests)
fusion Operation fusion analysis (6 tests)
shape_inference Shape validation (8 tests)
capabilities Runtime capability detection (4 tests)
tensor_loss 7 differentiable loss functions (included in total)
geometric_ops GCN, Laplacian variants, SO(3), spherical harmonics (included in total)
signal_ops STFT, DCT, DFT, window functions, Mel filterbank (included in total)
proptests Property-based mathematical property verification (included in total)

Property-Based Testing

Uses proptest to verify mathematical properties:

  • Addition commutativity: a + b = b + a
  • Multiplication associativity: (a * b) * c = a * (b * c)
  • Distributivity: a * (b + c) = a*b + a*c
  • Sum linearity: sum(a*x + b*y) = a*sum(x) + b*sum(y)
  • Sigmoid range: 0 <= sigmoid(x) <= 1
  • Identity/inverse properties

Integration Example

Full example with training:

use tensorlogic_compiler::compile_to_einsum;
use tensorlogic_scirs_backend::Scirs2Exec;
use tensorlogic_infer::TlAutodiff;
use tensorlogic_ir::{TLExpr, Term};

// Define rule: knows(x,y) ∧ knows(y,z) → knows(x,z) (transitivity)
let knows_xy = TLExpr::pred("knows", vec![Term::var("x"), Term::var("y")]);
let knows_yz = TLExpr::pred("knows", vec![Term::var("y"), Term::var("z")]);
let premise = TLExpr::and(knows_xy, knows_yz);

// Compile to graph
let graph = compile_to_einsum(&premise)?;

// Setup executor with input data
let mut executor = Scirs2Exec::new();
let knows_matrix = Scirs2Exec::from_vec(
    vec![1.0, 0.0, 1.0,
         0.0, 1.0, 0.0,
         0.0, 0.0, 1.0],
    vec![3, 3]
)?;
executor.add_tensor("knows[ab]", knows_matrix);

// Forward pass
let result = executor.forward(&graph)?;
println!("Result shape: {:?}", result.shape());

// Backward pass for training
let loss_grad = Scirs2Exec::ones(result.shape().to_vec())?;
let mut grads = std::collections::HashMap::new();
grads.insert("output", loss_grad);
let input_grads = executor.backward(&graph, grads)?;

// Access gradients
for (name, grad) in input_grads.tensors.iter() {
    println!("Gradient for {}: {:?}", name, grad.shape());
}

API Documentation

Key public types:

  • Scirs2Exec: Main executor implementing TlAutodiff trait
  • TlBackendError: Comprehensive error types
  • ExecutionTracer: Debug tracing with multiple levels
  • FallbackConfig: Numerical stability configuration
  • ForwardTape: Stores intermediate values for backward pass
  • ParallelBatchExecutor: Batch processing with parallelization
  • ProfiledScirs2Exec: Performance profiling wrapper
  • MetricsCollector: Per-operation timing and memory tracking
  • GraphOptimizer: Pre-execution optimization passes
  • InplaceExecutor: Zero-allocation in-place operations
  • CheckpointManager: Training state save/restore

See full API docs for details.

Limitations and Future Work

Current limitations:

  • No GPU execution: CPU/SIMD only (CUDA detection ready; execution planned via scirs2 GPU features)
  • No JIT compilation: Eager and graph modes supported; JIT planned
  • No distributed execution: Single-device only

See TODO.md for the complete roadmap.

Contributing

When contributing:

  1. Follow SciRS2 integration policy strictly
  2. Add tests for all new features (maintain 100% pass rate)
  3. Use cargo clippy -- -D warnings (zero warnings policy)
  4. Format code with cargo fmt
  5. Keep files under 2000 lines (use SplitRS if needed)
  6. Update TODO.md with task status

License

Apache-2.0


Status: Stable (v0.1.0) Last Updated: 2026-04-06 Tests: 669/669 passing (100%) Part of: TensorLogic Ecosystem