Expand description
SciRS2-backed executor (CPU/SIMD/GPU via features).
Version: 0.1.0 | Status: Production Ready
This crate provides a production-ready implementation of the TensorLogic execution traits using the SciRS2 scientific computing library.
§Core Features
§Execution Engine
- Forward pass: Tensor operations (einsum, element-wise, reductions)
- Backward pass: Automatic differentiation with stored intermediate values
- Gradient checking: Numeric verification for correctness
- Batch execution: Parallel processing support for multiple inputs
§Performance
- Memory pooling: Efficient tensor allocation with shape-based reuse
- Operation fusion: Analysis and optimization opportunities
- SIMD support: Vectorized operations via feature flags
- Profiling: Detailed performance monitoring and tracing
§Reliability
- Error handling: Comprehensive error types with detailed context
- Execution tracing: Multi-level debugging and operation tracking
- Numerical stability: Fallback mechanisms for NaN/Inf handling
- Shape validation: Runtime shape inference and verification
§Testing
- 104 tests: Including unit, integration, and property-based tests
- Property tests: Mathematical properties verified with proptest
- Gradient tests: Numeric gradient checking for autodiff correctness
§Module Organization
executor: Core Scirs2Exec implementationautodiff: Backward pass and gradient computationgradient_ops: Advanced gradient operations (STE, Gumbel-Softmax, soft quantifiers)error: Comprehensive error types and validationfallback: Numerical stability and NaN/Inf handlingtracing: Execution debugging and performance trackingmemory_pool: Efficient tensor allocationfusion: Operation fusion analysisgradient_check: Numeric gradient verificationshape_inference: Runtime shape validationbatch_executor: Parallel batch processingprofiled_executor: Performance profiling wrappercapabilities: Runtime capability detectiondependency_analyzer: Graph dependency analysis for parallel executionparallel_executor: Multi-threaded parallel execution using Rayondevice: Device management (CPU/GPU selection)execution_mode: Execution mode abstractions (Eager/Graph/JIT)precision: Precision control (f32/f64/mixed)
Re-exports§
pub use activations::elu;pub use activations::gelu;pub use activations::gelu_approx;pub use activations::gelu_scalar;pub use activations::hardsigmoid;pub use activations::hardswish;pub use activations::leaky_relu;pub use activations::log_softmax;pub use activations::mish;pub use activations::prelu;pub use activations::relu;pub use activations::relu6;pub use activations::relu_grad;pub use activations::relu_scalar;pub use activations::selu;pub use activations::sigmoid;pub use activations::sigmoid_grad;pub use activations::sigmoid_scalar;pub use activations::silu;pub use activations::softmax;pub use activations::softplus;pub use activations::softsign;pub use activations::swish;pub use activations::swish_scalar;pub use activations::tanh_activation;pub use activations::tanh_grad;pub use activations::ActivationBenchmark;pub use activations::ActivationError;pub use activations::ActivationType;pub use attention::attention_entropy;pub use attention::chunked_attention;pub use attention::scaled_dot_product_attention;pub use attention::stable_softmax;pub use attention::AttentionConfig;pub use attention::AttentionError;pub use attention::AttentionOutput;pub use attention::MultiHeadAttention;pub use attention_grad::attention_backward;pub use attention_grad::multihead_attention_backward;pub use attention_grad::softmax_backward;pub use attention_grad::AttentionGradients;pub use attention_grad::MultiHeadAttentionGrad;pub use batch_executor::ParallelBatchExecutor;pub use blocked_sparse::blocked_sparse_add;pub use blocked_sparse::blocked_sparse_dense_mm;pub use blocked_sparse::blocked_sparse_mm;pub use blocked_sparse::blocked_sparse_scale;pub use blocked_sparse::BlockSparsityPattern;pub use blocked_sparse::BlockSparsityStats;pub use blocked_sparse::BlockedSparseDynTensor;pub use blocked_sparse::BlockedSparseError;pub use blocked_sparse::BlockedSparseTensor;pub use checkpoint::Checkpoint;pub use checkpoint::CheckpointConfig;pub use checkpoint::CheckpointManager;pub use checkpoint::CheckpointMetadata;pub use comparison::abs_diff;pub use comparison::assert_tensors_close;pub use comparison::compare_tensors;pub use comparison::count_non_finite;pub use comparison::is_finite;pub use comparison::ComparisonError;pub use comparison::ComparisonResult;pub use comparison::Tolerance;pub use convolution::col2im;pub use convolution::conv1d;pub use convolution::conv2d;pub use convolution::conv_transpose2d;pub use convolution::depthwise_conv2d;pub use convolution::im2col;pub use convolution::ConvConfig;pub use convolution::ConvError;pub use convolution::ConvStats;pub use cuda_detect::cuda_device_count;pub use cuda_detect::cuda_devices_to_device_list;pub use cuda_detect::detect_cuda_devices;pub use cuda_detect::is_cuda_available;pub use cuda_detect::CudaDeviceInfo;pub use custom_ops::BinaryCustomOp;pub use custom_ops::CustomOp;pub use custom_ops::CustomOpContext;pub use custom_ops::EluOp;pub use custom_ops::GeluOp;pub use custom_ops::HardSigmoidOp;pub use custom_ops::HardSwishOp;pub use custom_ops::LeakyReluOp;pub use custom_ops::MishOp;pub use custom_ops::OpRegistry;pub use custom_ops::SoftplusOp;pub use custom_ops::SwishOp;pub use decomposition::cp_als;pub use decomposition::fold;pub use decomposition::hosvd;pub use decomposition::truncated_svd;pub use decomposition::tucker1;pub use decomposition::unfold;pub use decomposition::CpDecomposition;pub use decomposition::DecompositionError;pub use decomposition::HosvdResult;pub use decomposition::TruncatedSvd;pub use decomposition::Tucker1Result;pub use dependency_analyzer::DependencyAnalysis;pub use dependency_analyzer::DependencyStats;pub use dependency_analyzer::OperationDependency;pub use device::Device;pub use device::DeviceError;pub use device::DeviceType;pub use device::SystemDeviceManager;pub use device_manager::DeviceConfig;pub use device_manager::DeviceManager;pub use device_manager::DeviceSelector;pub use device_manager::HeuristicSelector;pub use device_manager::OpDescriptor;pub use device_manager::OpKind;pub use error::NumericalError;pub use error::NumericalErrorKind;pub use error::ShapeMismatchError;pub use error::TlBackendError;pub use error::TlBackendResult;pub use execution_mode::CompilationStats;pub use execution_mode::CompiledGraph;pub use execution_mode::ExecutionConfig;pub use execution_mode::ExecutionMode;pub use execution_mode::MemoryPlan;pub use execution_mode::OptimizationConfig;pub use executor_f32::Scirs2Exec32;pub use executor_f32::Scirs2Tensor32;pub use fallback::is_valid;pub use fallback::sanitize_tensor;pub use fallback::FallbackConfig;pub use gather_scatter::gather;pub use gather_scatter::gather_nd;pub use gather_scatter::masked_fill;pub use gather_scatter::masked_select;pub use gather_scatter::scatter_add;pub use gather_scatter::scatter_max;pub use gather_scatter::scatter_min;pub use gather_scatter::top_k;pub use gather_scatter::GatherScatterError;pub use gather_scatter::IndexStats;pub use geometric_ops::gcn_layer;pub use geometric_ops::graph_laplacian;pub use geometric_ops::mat_mul;pub use geometric_ops::sph_harm;pub use geometric_ops::spherical_harmonics;pub use geometric_ops::AdjacencyMatrix;pub use geometric_ops::GcnActivation;pub use geometric_ops::GeoError;pub use geometric_ops::LaplacianMatrix;pub use geometric_ops::LaplacianType;pub use geometric_ops::Rotation3;pub use gpu_readiness::assess_gpu_readiness;pub use gpu_readiness::generate_recommendations;pub use gpu_readiness::recommend_batch_size;pub use gpu_readiness::GpuCapability;pub use gpu_readiness::GpuReadinessReport;pub use gpu_readiness::WorkloadProfile;pub use gradient_ops::gumbel_softmax;pub use gradient_ops::gumbel_softmax_backward;pub use gradient_ops::soft_exists;pub use gradient_ops::soft_exists_backward;pub use gradient_ops::soft_forall;pub use gradient_ops::soft_forall_backward;pub use gradient_ops::ste_threshold;pub use gradient_ops::ste_threshold_backward;pub use gradient_ops::GumbelSoftmaxConfig;pub use gradient_ops::QuantifierMode;pub use gradient_ops::SteConfig;pub use graph_optimizer::GraphOptimizer;pub use graph_optimizer::GraphOptimizerBuilder;pub use graph_optimizer::OptimizationPass;pub use graph_optimizer::OptimizationStats;pub use inplace_ops::can_execute_inplace;pub use inplace_ops::is_shape_preserving;pub use inplace_ops::InplaceExecutor;pub use inplace_ops::InplaceStats;pub use lazy::EvaluationPlan;pub use lazy::LazyEinsumGraph;pub use lazy::LazyExecutor;pub use lazy::LazyStats;pub use lazy::LazyTensor;pub use lazy::NodeMemoryEstimate;pub use memory_profiler::AllocationRecord;pub use memory_profiler::AtomicMemoryCounter;pub use memory_profiler::MemoryProfiler;pub use memory_profiler::MemoryStats as ProfilerMemoryStats;pub use metrics::format_bytes;pub use metrics::AtomicMetrics;pub use metrics::MemoryStats;pub use metrics::MetricsCollector;pub use metrics::MetricsConfig;pub use metrics::MetricsSummary;pub use metrics::OperationRecord;pub use metrics::OperationStats;pub use metrics::ThroughputStats;pub use parallel_executor::ParallelConfig;pub use parallel_executor::ParallelScirs2Exec;pub use parallel_executor::ParallelStats;pub use pooling::adaptive_avg_pool;pub use pooling::avg_pool;pub use pooling::global_avg_pool;pub use pooling::global_max_pool;pub use pooling::lp_pool;pub use pooling::max_pool;pub use pooling::max_pool_with_indices;pub use pooling::max_unpool;pub use pooling::PoolConfig;pub use pooling::PoolingError;pub use pooling::PoolingStats;pub use precision::ComputePrecision;pub use precision::Precision;pub use precision::PrecisionConfig;pub use precision::Scalar;pub use precision_cast::cast_f32_to_f64;pub use precision_cast::cast_f64_to_f32;pub use precision_cast::DualPrecisionBridge;pub use profiled_executor::ProfiledScirs2Exec;pub use quantization::calibrate_quantization;pub use quantization::QatConfig;pub use quantization::QuantizationGranularity;pub use quantization::QuantizationParams;pub use quantization::QuantizationScheme;pub use quantization::QuantizationStats;pub use quantization::QuantizationType;pub use quantization::QuantizedTensor;pub use recurrent::gru_sequence;pub use recurrent::lstm_sequence;pub use recurrent::rnn_sequence;pub use recurrent::GruCell;pub use recurrent::LstmCell;pub use recurrent::LstmState;pub use recurrent::RecurrentError;pub use recurrent::RecurrentStats;pub use recurrent::RnnCell;pub use scoring::log_sum_exp;pub use scoring::weighted_soft_exists;pub use scoring::weighted_soft_forall;pub use scoring::LogSpaceAggregator;pub use scoring::ScoringConfig;pub use scoring::ScoringError;pub use scoring::ScoringMode;pub use scoring::WeightedQuantifier;pub use shape_inference::validate_tensor_shapes;pub use shape_inference::Scirs2ShapeInference;pub use signal_ops::apply_window;pub use signal_ops::dct;pub use signal_ops::dft;pub use signal_ops::fir_filter;pub use signal_ops::hz_to_mel;pub use signal_ops::idct;pub use signal_ops::idft;pub use signal_ops::istft;pub use signal_ops::mel_filterbank;pub use signal_ops::mel_to_hz;pub use signal_ops::stft;pub use signal_ops::window;pub use signal_ops::Complex;pub use signal_ops::FirFilter;pub use signal_ops::SignalError;pub use signal_ops::StftResult;pub use signal_ops::WindowType;pub use tensor_io::load_tensor;pub use tensor_io::load_tensors;pub use tensor_io::read_header;pub use tensor_io::read_tensor;pub use tensor_io::save_tensor;pub use tensor_io::save_tensors;pub use tensor_io::write_tensor;pub use tensor_io::TensorHeader;pub use tensor_io::TensorIoError;pub use tensor_loss::LossReduction;pub use tensor_loss::TensorBCELoss;pub use tensor_loss::TensorCosineEmbeddingLoss;pub use tensor_loss::TensorCrossEntropyLoss;pub use tensor_loss::TensorFocalLoss;pub use tensor_loss::TensorHuberLoss;pub use tensor_loss::TensorKLDivLoss;pub use tensor_loss::TensorLoss;pub use tensor_loss::TensorLossConfig;pub use tensor_loss::TensorLossError;pub use tensor_loss::TensorLossOutput;pub use tensor_loss::TensorLossRegistry;pub use tensor_loss::TensorMseLoss;pub use tracing::ExecutionTracer;pub use tracing::TraceEvent;pub use tracing::TraceLevel;
Modules§
- activations
- Activation functions for neural network layers.
- attention
- Numerically stable attention operations for TensorLogic.
- attention_
grad - Backward pass for attention operations.
- batch_
executor - Batch execution support for parallel processing.
- blocked_
sparse - Blocked Sparse Row (BSR) format tensor operations.
- capabilities
- Backend capability detection and reporting.
- checkpoint
- Checkpoint and resume functionality for training workflows.
- comparison
- Tensor comparison utilities for testing and validation.
- convolution
- Convolution operations for neural network tensor processing.
- cuda_
detect - CUDA device detection utilities.
- custom_
ops - Custom operations infrastructure with dynamic registration.
- decomposition
- Tensor decomposition algorithms for the SciRS2 backend.
- dependency_
analyzer - Dependency analysis for parallel execution of EinsumGraph operations.
- device
- Device management for tensor computations.
- device_
manager - Operation-level device selection and management.
- error
- Comprehensive error types for tensorlogic-scirs-backend.
- execution_
mode - Execution mode abstractions for different execution strategies.
- executor_
f32 - SciRS2 f32 executor implementation.
- fallback
- Fallback mechanisms for numerical stability.
- fusion
- Operation fusion for improved performance.
- gather_
scatter - Gather/Scatter operations for tensor indexing and selection.
- geometric_
ops - Geometric deep learning operations.
- gpu_
readiness - GPU Readiness Framework
- gradient_
check - Numeric gradient checking utilities for verifying analytical gradients.
- gradient_
ops - Advanced gradient operations for non-differentiable logical operations.
- graph_
optimizer - Graph optimization passes for improved execution performance.
- inplace_
ops - In-place operations for memory optimization.
- lazy
- Lazy evaluation for large EinsumGraphs.
- memory_
pool - Memory pooling for efficient tensor allocation.
- memory_
profiler - Memory Profiling Utilities for TensorLogic
- metrics
- Comprehensive performance monitoring and metrics collection.
- parallel_
executor - Parallel executor implementation using Rayon for multi-threaded execution.
- pooling
- Pooling operations for neural network tensor processing.
- precision
- Precision control for tensor computations.
- precision_
cast - Utilities for casting between f32 and f64 tensors, and a dual-precision bridge.
- profiled_
executor - Performance profiling support for execution monitoring.
- quantization
- Quantization Infrastructure for TensorLogic
- recurrent
- Recurrent neural network cells: RNN, LSTM, GRU.
- scoring
- Log-space scoring aggregation and weighted quantifiers.
- shape_
inference - Shape inference and validation support.
- signal_
ops - Signal processing operations for audio and time-series data.
- tensor_
io - Tensor binary serialization and deserialization.
- tensor_
loss - Tensor-level loss functions operating on
ArrayD<f64>with optional gradient output. - tracing
- Execution tracing and debugging support.
Structs§
- Forward
Tape - Stores intermediate values from forward pass for gradient computation
- Scirs2
Exec