Skip to main content

Crate tensorlogic_infer

Crate tensorlogic_infer 

Source
Expand description

Engine-agnostic traits and execution planning API.

Version: 0.1.0-beta.1 | Status: Production Ready

This crate defines the abstract execution interfaces and optimization utilities for TensorLogic:

ยงCore Execution Traits

  • TlExecutor: Core tensor operations (einsum, element-wise, reductions)
  • TlAutodiff: Forward/backward pass for automatic differentiation
  • TlEnhancedAutodiff: Enhanced autodiff with gradient accumulation, clipping, scaling
  • TlBatchExecutor: Batch execution support
  • TlStreamingExecutor: Streaming execution for large datasets
  • TlRecoverableExecutor: Execution with error recovery and checkpointing
  • TlCapabilities: Backend capability queries
  • TlProfiledExecutor: Execution profiling
  • TlJitExecutor: Just-In-Time compilation support
  • TlDistributedExecutor: Distributed multi-device execution

ยงOptimization Utilities

  • GraphOptimizer: Fusion detection, dead node elimination, redundancy analysis
  • FusionPlanner: Planning and validation of fusion transformations
  • Scheduler: Execution scheduling with multiple strategies (sequential, parallel, cost-based)
  • PlacementOptimizer: Device placement and multi-device coordination
  • TensorCache: Result caching with LRU/FIFO/LFU eviction policies
  • MemoryPool: Tensor memory pooling for allocation reuse
  • ExecutionStrategy: Complete strategy configuration (mode, precision, memory, parallelism)
  • ExecutionContext: State management and lifecycle tracking with hooks
  • GraphCompiler: Ahead-of-time graph compilation with optimization passes
  • CompilationCache: Caching of compiled graphs to avoid recompilation

ยงJIT Compilation

  • JitCompiler: Runtime compilation with hot path detection
  • JitCache: Specialized caching for JIT-compiled graphs
  • HotPathDetector: Identifies frequently executed code paths
  • AdaptiveOptimizer: Progressively optimizes based on runtime profiling

ยงDistributed Execution

  • DistributedExecutor: Multi-device execution coordination
  • DataParallelCoordinator: Data-parallel training across devices
  • ModelParallelCoordinator: Model-parallel execution with tensor sharding
  • PipelineParallelCoordinator: Pipeline parallelism across stages
  • CommunicationBackend: Abstract interface for device communication

ยงZero-Copy Operations (Beta.1) ๐Ÿ†•

  • TensorView: Zero-copy tensor views and slicing
  • SliceSpec: Flexible slicing specifications
  • ViewBuilder: Ergonomic view construction
  • TensorViewable: Trait for zero-copy tensor operations

ยงAsync Execution (Beta.1) ๐Ÿ†•

  • TlAsyncExecutor: Async/await-based non-blocking execution
  • TlAsyncBatchExecutor: Asynchronous batch processing
  • TlAsyncStreamExecutor: Async streaming with backpressure
  • AsyncExecutorPool: Load-balanced executor pool

ยงEnhanced Diagnostics (Beta.1) ๐Ÿ†•

  • Diagnostic: Rich error messages with suggestions
  • DiagnosticCollector: Error aggregation and reporting
  • ShapeMismatchDiagnostic: Helpful shape error messages
  • PerformanceDiagnostic: Performance issue detection

ยงAnalysis and Validation

  • GraphValidator: Graph validation and diagnostics
  • MemoryEstimator: Memory usage estimation and lifetime analysis
  • ShapeInferenceContext: Tensor shape inference for optimization

ยงDebugging Utilities

  • ExecutionTracer: Record execution flow through computation graphs
  • TensorInspector: Examine intermediate tensor values and statistics
  • BreakpointManager: Pause execution at specific nodes for inspection
  • ExecutionRecorder: Record full execution history for replay and analysis

ยงVisualization Utilities

  • TimelineVisualizer: ASCII/DOT/JSON timeline visualization
  • GraphVisualizer: Computation graph visualization
  • TensorStatsVisualizer: Tensor statistics and histograms
  • ExportFormat: Export to various formats for external tools

ยงTesting and Development

  • DummyExecutor: Minimal implementation for testing and prototyping
  • DummyTensor: Simple tensor representation for tests
  • Backend Tests: Comprehensive test templates for backend validation
  • Gradient Checking: Numerical gradient verification utilities

ยงEager Execution

  • TlEagerAutodiff: Eager mode automatic differentiation
  • Variable: Variables with gradient tracking
  • EagerTape: Dynamic computation graph recording

ยงAdvanced Quantization (Beta.1) ๐Ÿ†•

  • Quantizer: Complete quantization pipeline (QAT/PTQ)
  • QuantizationType: INT8, INT4, INT2, FP8, Binary, Ternary support
  • CalibrationStrategy: Multiple calibration methods (MinMax, Percentile, MSE, KL-divergence)
  • FakeQuantize: Quantization simulation for training

ยงDynamic Batching (Beta.1) ๐Ÿ†•

  • DynamicBatcher: Adaptive request batching with priority queues
  • RequestQueue: Priority-based queuing (Low/Normal/High/Critical)
  • AdaptiveBatcher: Automatic batch size optimization
  • BatchingStats: Comprehensive throughput and latency metrics

ยงAdvanced Kernel Fusion (Beta.1) ๐Ÿ†•

  • FusionOptimizer: Pattern-based fusion detection and optimization
  • FusionStrategy: Conservative/Aggressive/Balanced/Memory-aware modes
  • FusionCostModel: Memory bandwidth-aware cost modeling
  • FusionPattern: Common patterns (MatMul+Bias, MatMul+Activation, etc.)

ยงWorkspace Management (Beta.1) ๐Ÿ†•

  • WorkspacePool: Memory pool with multiple allocation strategies
  • SharedWorkspacePool: Thread-safe workspace sharing
  • AllocationStrategy: BestFit/FirstFit/ExactFit/PowerOfTwo
  • WorkspaceStats: Efficiency metrics and hit rate tracking

ยงMulti-Model Coordination (Beta.1) ๐Ÿ†•

  • MultiModelCoordinator: Ensemble and multi-model management
  • EnsembleStrategy: Averaging/Voting/Stacking/Boosting
  • RoutingStrategy: Priority/Latency/Accuracy-based model selection
  • CascadeConfig: Early-exit model cascades

ยงMixed Precision Training (Beta.1) ๐Ÿ†•

  • MixedPrecisionConfig: FP16/BF16/FP8 configuration
  • LossScaler: Automatic loss scaling with dynamic adjustment
  • PrecisionMode: Multiple precision modes (FP32/FP16/BF16/FP8/FP64)
  • GradientCheckpoint: Memory-efficient gradient checkpointing
  • MixedPrecisionState: Complete training state management

ยงSparse Tensor Support (Beta.1) ๐Ÿ†•

  • SparseTensor: CSR/CSC/COO sparse formats
  • SparseCSR: Compressed Sparse Row format
  • SparseCSC: Compressed Sparse Column format
  • SparseCOO: Coordinate format for construction
  • Automatic sparsity detection: Convert dense to sparse when beneficial

ยงParallel Execution (Beta.1) ๐Ÿ†•

  • WorkStealingScheduler: Dynamic load balancing scheduler
  • Task: Parallel task with dependencies and priorities
  • StealStrategy: Multiple work-stealing strategies
  • NumaStrategy: NUMA-aware memory allocation
  • LoadBalanceStats: Load balancing metrics

ยงSIMD Optimizations (Beta.1) ๐Ÿ†•

  • SimdCapabilities: Platform detection (AVX2/AVX-512/NEON/SVE)
  • AlignedBuffer: SIMD-aligned memory allocations
  • SimdInstructionSet: Instruction set abstractions
  • SimdOptimizationHints: Compiler optimization hints

ยงGraph Rewriting (Beta.1) ๐Ÿ†•

  • RewriteEngine: Pattern-based graph transformations
  • Pattern: Flexible pattern matching DSL
  • RewriteRule: Custom rewrite rules
  • CommonRules: Standard optimization rules (constant folding, etc.)
  • RewriteStrategy: Application strategies (exhaustive, fixed-point, etc.)

ยงProfiling-Guided Optimization (Beta.1) ๐Ÿ†•

  • ProfilingOptimizer: Adaptive performance tuning
  • ExecutionProfile: Runtime performance metrics
  • Hotspot: Performance bottleneck detection
  • OptimizationGoal: Optimization objectives (latency, throughput, memory)
  • Auto-tuning: Automatic configuration selection

ยงCache Optimization (Beta.1) ๐Ÿ†•

  • CacheOptimizer: Memory hierarchy aware optimization
  • CacheConfig: L1/L2/L3 cache configuration
  • TilingParams: Loop tiling for cache efficiency
  • CacheMetrics: Cache performance estimation
  • DataLayout: Cache-friendly data arrangements

ยงAutomatic Parallelization (Experimental) ๐Ÿงช

  • AutoParallelizer: Automatic detection of parallelism opportunities
  • ParallelizationAnalysis: Analysis of parallel execution potential
  • ParallelExecutionPlan: Generated parallel execution plans
  • WorkPartition: Work distribution across workers
  • Cost modeling: Estimate execution costs and communication overhead

ยงSpeculative Execution (Experimental) ๐Ÿงช

  • SpeculativeExecutor: Branch prediction and speculative execution
  • PredictionStrategy: Multiple prediction strategies
  • RollbackPolicy: Handling mispredictions
  • SpeculationStats: Track speculation success rates
  • Adaptive learning: Learn from prediction outcomes

ยงLearned Optimizations (Experimental) ๐Ÿงช

  • LearnedOptimizer: ML-based optimization decisions
  • LearningStrategy: Supervised, reinforcement, online learning
  • CostPrediction: Learned cost models
  • FusionRecommendation: ML-based fusion decisions
  • Reinforcement learning: Q-learning for scheduling

Re-exportsยง

pub use auto_parallel::AutoParallelError;
pub use auto_parallel::AutoParallelizer;
pub use auto_parallel::CostModel as AutoParallelCostModel;
pub use auto_parallel::DependencyType;
pub use auto_parallel::NodeId as AutoParallelNodeId;
pub use auto_parallel::NodeInfo;
pub use auto_parallel::ParallelExecutionPlan;
pub use auto_parallel::ParallelStage;
pub use auto_parallel::ParallelizationAnalysis;
pub use auto_parallel::ParallelizationStrategy;
pub use auto_parallel::WorkPartition;
pub use autodiff::AccumulationConfig;
pub use autodiff::ClippingStrategy;
pub use autodiff::CustomGradientRegistry;
pub use autodiff::GradientAccumulationStrategy;
pub use autodiff::GradientAccumulator;
pub use autodiff::GradientClipper;
pub use autodiff::GradientConfig;
pub use autodiff::GradientScaler;
pub use autodiff::GradientScaling;
pub use autodiff::GradientStats;
pub use autodiff::TlEnhancedAutodiff;
pub use backend_tests::assert_vec_close;
pub use backend_tests::print_test_summary;
pub use backend_tests::run_all_basic_tests;
pub use backend_tests::run_all_performance_tests;
pub use backend_tests::test_backend_edge_cases;
pub use backend_tests::test_backend_einsum;
pub use backend_tests::test_backend_elem_binary;
pub use backend_tests::test_backend_elem_unary;
pub use backend_tests::test_backend_forward;
pub use backend_tests::test_backend_large_tensors;
pub use backend_tests::test_backend_memory_efficiency;
pub use backend_tests::test_backend_reduce;
pub use backend_tests::test_backend_shapes;
pub use backend_tests::BackendTestAdapter;
pub use backend_tests::TestResult;
pub use backend_tests::DEFAULT_TOLERANCE;
pub use batch::BatchResult;
pub use batch::TlBatchExecutor;
pub use cache::CacheKey;
pub use cache::CacheStats;
pub use cache::EvictionPolicy;
pub use cache::MemoryPool;
pub use cache::PoolStats;
pub use cache::TensorCache;
pub use cache_optimizer::AccessPattern;
pub use cache_optimizer::CacheConfig;
pub use cache_optimizer::CacheLevel;
pub use cache_optimizer::CacheMetrics;
pub use cache_optimizer::CacheOptimizer;
pub use cache_optimizer::CacheOptimizerError;
pub use cache_optimizer::DataLayout;
pub use cache_optimizer::OptimizationStats as CacheOptimizationStats;
pub use cache_optimizer::TilingParams;
pub use capabilities::BackendCapabilities;
pub use capabilities::DType;
pub use capabilities::DeviceType;
pub use capabilities::Feature;
pub use capabilities::TlCapabilities;
pub use compilation::CacheStats as CompilationCacheStats;
pub use compilation::CompilationCache;
pub use compilation::CompilationConfig;
pub use compilation::CompilationKey;
pub use compilation::CompilationStats;
pub use compilation::CompiledGraph;
pub use compilation::GraphCompiler;
pub use compilation::OptimizationLevel;
pub use compilation::TlCompilableExecutor;
pub use context::ExecutionContext;
pub use context::ExecutionHook;
pub use context::ExecutionPhase;
pub use context::ExecutionState;
pub use context::LoggingHook;
pub use debug::Breakpoint;
pub use debug::BreakpointHit;
pub use debug::BreakpointManager;
pub use debug::ExecutionRecorder;
pub use debug::ExecutionReport;
pub use debug::ExecutionTrace;
pub use debug::ExecutionTracer;
pub use debug::OperationHandle;
pub use debug::TensorInspector;
pub use debug::TensorStats;
pub use debug::TraceEntry as DebugTraceEntry;
pub use debug::TraceSummary;
pub use diagnostics::Diagnostic;
pub use diagnostics::DiagnosticCollector;
pub use diagnostics::MemoryDiagnostic;
pub use diagnostics::NodeExecutionDiagnostic;
pub use diagnostics::PerformanceDiagnostic;
pub use diagnostics::Severity;
pub use diagnostics::ShapeMismatchDiagnostic;
pub use diagnostics::SourceLocation;
pub use diagnostics::TypeMismatchDiagnostic;
pub use distributed::CommunicationBackend;
pub use distributed::CommunicationOp;
pub use distributed::DataParallelCoordinator;
pub use distributed::DistributedConfig;
pub use distributed::DistributedExecutor;
pub use distributed::DistributedPlacementPlan;
pub use distributed::DistributedStats;
pub use distributed::DummyCommunicationBackend;
pub use distributed::ModelParallelCoordinator;
pub use distributed::ParallelismStrategy as DistributedParallelismStrategy;
pub use distributed::PipelineParallelCoordinator;
pub use distributed::ReductionOp;
pub use distributed::ShardingSpec;
pub use distributed::TlDistributedExecutor;
pub use dynamic_batching::AdaptiveBatcher;
pub use dynamic_batching::BatchRequest;
pub use dynamic_batching::BatchingError;
pub use dynamic_batching::BatchingStats;
pub use dynamic_batching::DynamicBatchConfig;
pub use dynamic_batching::DynamicBatcher;
pub use dynamic_batching::Priority;
pub use dynamic_batching::RequestMetadata;
pub use dynamic_batching::RequestQueue;
pub use eager::EagerOp;
pub use eager::EagerOps;
pub use eager::EagerTape;
pub use eager::TlEagerAutodiff;
pub use eager::Variable;
pub use eager::VariableGrad;
pub use fusion::FusionCandidate;
pub use fusion::FusionConfig;
pub use fusion::FusionCostModel;
pub use fusion::FusionError;
pub use fusion::FusionOptimizer;
pub use fusion::FusionPattern;
pub use fusion::FusionStats;
pub use fusion::FusionStrategy;
pub use gradcheck::compare_gradients;
pub use gradcheck::numerical_gradient_central;
pub use gradcheck::numerical_gradient_forward;
pub use gradcheck::quick_check;
pub use gradcheck::GradCheckConfig;
pub use gradcheck::GradCheckResult;
pub use gradcheck::GradientChecker;
pub use gradcheck::GradientError;
pub use jit::AdaptiveOptimizationPlan;
pub use jit::AdaptiveOptimizer;
pub use jit::HotPathDetector;
pub use jit::JitCache;
pub use jit::JitCacheEntry;
pub use jit::JitCacheStats;
pub use jit::JitCompiler;
pub use jit::JitConfig;
pub use jit::JitEntryStats;
pub use jit::JitKey;
pub use jit::JitStats;
pub use jit::SpecializationContext;
pub use jit::TlJitExecutor;
pub use learned_opt::CostPrediction;
pub use learned_opt::FeatureVector;
pub use learned_opt::FusionRecommendation;
pub use learned_opt::LearnedOptError;
pub use learned_opt::LearnedOptimizer;
pub use learned_opt::LearningStats;
pub use learned_opt::LearningStrategy;
pub use learned_opt::ModelType;
pub use learned_opt::NodeId as LearnedOptNodeId;
pub use learned_opt::OptimizationAction;
pub use learned_opt::RewardSignal;
pub use learned_opt::ScheduleRecommendation;
pub use learned_opt::TrainingExample;
pub use memory::MemoryEstimate;
pub use memory::MemoryEstimator;
pub use memory::TensorMemory;
pub use mixed_precision::GradientCheckpoint;
pub use mixed_precision::LossScaler;
pub use mixed_precision::LossScalerStats;
pub use mixed_precision::LossScalingStrategy;
pub use mixed_precision::MixedPrecisionConfig;
pub use mixed_precision::MixedPrecisionError;
pub use mixed_precision::MixedPrecisionState;
pub use mixed_precision::MixedPrecisionStats;
pub use mixed_precision::PrecisionMode;
pub use multimodel::CascadeConfig;
pub use multimodel::CoordinationStats;
pub use multimodel::EnsembleConfig;
pub use multimodel::EnsembleStrategy;
pub use multimodel::ModelMetadata;
pub use multimodel::MultiModelCoordinator;
pub use multimodel::MultiModelError;
pub use multimodel::ResourceRequirements;
pub use multimodel::RoutingStrategy;
pub use multimodel::TlEnsembleExecutor;
pub use multimodel::TlModelRouter;
pub use optimization::FusionOpportunity;
pub use optimization::FusionPlanner;
pub use optimization::FusionType;
pub use optimization::GraphOptimizer;
pub use optimization::OptimizationResult;
pub use parallel::LoadBalanceStats;
pub use parallel::NumaNode;
pub use parallel::NumaStrategy;
pub use parallel::ParallelConfig;
pub use parallel::ParallelError;
pub use parallel::SchedulerStats;
pub use parallel::StealStrategy;
pub use parallel::Task;
pub use parallel::TaskId;
pub use parallel::TaskPriority;
pub use parallel::WorkStealingScheduler;
pub use perfregression::BenchmarkBaseline;
pub use perfregression::BenchmarkComparison;
pub use perfregression::BenchmarkConfig;
pub use perfregression::BenchmarkStats;
pub use perfregression::PerfRegression;
pub use perfregression::RegressionReport;
pub use placement::Device;
pub use placement::PlacementOptimizer;
pub use placement::PlacementPlan;
pub use placement::PlacementStrategy;
pub use profiling::Bottleneck;
pub use profiling::BottleneckAnalyzer;
pub use profiling::BottleneckReport;
pub use profiling::PerformanceBaseline;
pub use profiling::PerformanceComparison;
pub use profiling::ProfileData;
pub use profiling::ProfileStatistics;
pub use profiling::Profiler;
pub use profiling::ProfilerHook;
pub use profiling::TimelineProfiler;
pub use profiling::TlProfiledExecutor;
pub use profiling::TraceEntry;
pub use profiling_optimizer::ExecutionProfile;
pub use profiling_optimizer::Hotspot;
pub use profiling_optimizer::OptimizationGoal;
pub use profiling_optimizer::OptimizationReport;
pub use profiling_optimizer::OptimizationStrategy;
pub use profiling_optimizer::ProfilingOptimizer;
pub use profiling_optimizer::ProfilingOptimizerError;
pub use profiling_optimizer::TuningConfig;
pub use quantization::CalibrationStats;
pub use quantization::CalibrationStrategy;
pub use quantization::FakeQuantize;
pub use quantization::QuantizationConfig;
pub use quantization::QuantizationError;
pub use quantization::QuantizationGranularity;
pub use quantization::QuantizationMode;
pub use quantization::QuantizationParams;
pub use quantization::QuantizationSummary;
pub use quantization::QuantizationSymmetry;
pub use quantization::QuantizationType;
pub use quantization::Quantizer;
pub use recovery::Checkpoint;
pub use recovery::CheckpointManager;
pub use recovery::DegradationPolicy;
pub use recovery::FailureInfo;
pub use recovery::FallbackStrategy;
pub use recovery::RecoveryConfig;
pub use recovery::RecoveryMetadata;
pub use recovery::RecoveryResult;
pub use recovery::RecoveryStats;
pub use recovery::RecoveryStrategy;
pub use recovery::RetryPolicy;
pub use recovery::TlRecoverableExecutor;
pub use rewrite::CommonRules;
pub use rewrite::Match;
pub use rewrite::NodeId as RewriteNodeId;
pub use rewrite::Pattern;
pub use rewrite::ReplacementFn;
pub use rewrite::RewriteEngine;
pub use rewrite::RewriteError;
pub use rewrite::RewriteRule;
pub use rewrite::RewriteStats;
pub use rewrite::RewriteStrategy;
pub use scheduling::ExecutionSchedule;
pub use scheduling::NodeCost;
pub use scheduling::Scheduler;
pub use scheduling::SchedulingStrategy;
pub use shape::DimSize;
pub use shape::ShapeInferenceContext;
pub use shape::TensorShape;
pub use simd::AlignedBuffer;
pub use simd::CpuArchitecture;
pub use simd::SimdCapabilities;
pub use simd::SimdError;
pub use simd::SimdInstructionSet;
pub use simd::SimdOptimizationHints;
pub use sparse::detect_sparsity;
pub use sparse::to_sparse_if_beneficial;
pub use sparse::SparseCOO;
pub use sparse::SparseCSC;
pub use sparse::SparseCSR;
pub use sparse::SparseError;
pub use sparse::SparseFormat;
pub use sparse::SparseTensor;
pub use sparse::SparseTensorBuilder;
pub use speculative::BranchOutcome;
pub use speculative::NodeId as SpeculativeNodeId;
pub use speculative::PredictionStrategy;
pub use speculative::RollbackPolicy;
pub use speculative::SpeculationStats;
pub use speculative::SpeculativeError;
pub use speculative::SpeculativeExecutor;
pub use speculative::SpeculativeTask;
pub use strategy::ExecutionMode;
pub use strategy::ExecutionStrategy;
pub use strategy::GradientStrategy;
pub use strategy::MemoryStrategy;
pub use strategy::ParallelismStrategy;
pub use strategy::StrategyOptimizer;
pub use streaming::ChunkIterator;
pub use streaming::ChunkMetadata;
pub use streaming::StreamProcessor;
pub use streaming::StreamResult;
pub use streaming::StreamingConfig;
pub use streaming::StreamingMode;
pub use streaming::TlStreamingExecutor;
pub use tensor_view::InPlaceMode;
pub use tensor_view::InPlaceOps;
pub use tensor_view::SliceSpec;
pub use tensor_view::TensorView;
pub use tensor_view::TensorViewable;
pub use tensor_view::ViewBuilder;
pub use typesafe::BroadcastShape;
pub use typesafe::Dim;
pub use typesafe::DimMul;
pub use typesafe::DimOp;
pub use typesafe::DimSize as TypesafeDimSize;
pub use typesafe::Dyn;
pub use typesafe::EinsumSpec;
pub use typesafe::FixedShape;
pub use typesafe::Matrix;
pub use typesafe::MatrixOps;
pub use typesafe::Nat;
pub use typesafe::Scalar;
pub use typesafe::ShapeConstraint;
pub use typesafe::ShapedTensor;
pub use typesafe::Static;
pub use typesafe::Tensor3D;
pub use typesafe::Tensor4D;
pub use typesafe::TensorBuilder;
pub use typesafe::TypedBatch;
pub use typesafe::TypedInputs;
pub use typesafe::TypedOutputs;
pub use typesafe::TypedTensor;
pub use typesafe::TypedTensorOps;
pub use typesafe::Vector;
pub use typesafe::D1;
pub use typesafe::D2;
pub use typesafe::D3;
pub use typesafe::D4;
pub use typesafe::D5;
pub use typesafe::D6;
pub use typesafe::S;
pub use typesafe::Z;
pub use validation::GraphValidator;
pub use validation::ValidationResult;
pub use visualization::ExportFormat;
pub use visualization::GraphConfig;
pub use visualization::GraphVisualizer;
pub use visualization::TensorStatsVisualizer;
pub use visualization::TimelineConfig;
pub use visualization::TimelineVisualizer;
pub use visualization::VisualizationFormat;
pub use workspace::AllocationStrategy;
pub use workspace::DefragmentationResult;
pub use workspace::SharedWorkspacePool;
pub use workspace::Workspace;
pub use workspace::WorkspaceConfig;
pub use workspace::WorkspaceError;
pub use workspace::WorkspacePool;
pub use workspace::WorkspaceStats;

Modulesยง

async_exec
Asynchronous execution traits for concurrent tensor operations.
auto_parallel
Automatic parallelization for computation graphs.
autodiff
Autodiff enhancements for training and optimization.
backend_tests
Backend compatibility test templates.
batch
Batch execution support for processing multiple inputs efficiently.
cache
Tensor caching and memory pooling for efficient execution.
cache_optimizer
Memory hierarchy and cache-aware optimization.
capabilities
Backend capability queries and feature detection.
compilation
Graph compilation and caching infrastructure.
context
Execution context and state management for coordinated execution.
debug
Debugging utilities for execution tracing and tensor inspection.
diagnostics
Enhanced error diagnostics with helpful suggestions.
distributed
Distributed execution infrastructure for multi-device and multi-node computation.
dynamic_batching
Dynamic batching for inference serving.
eager
Eager mode automatic differentiation.
fusion
Advanced kernel fusion for optimized execution.
gradcheck
Gradient checking utilities for validating autodiff implementations.
jit
Just-In-Time (JIT) compilation infrastructure.
learned_opt
Machine learning-based optimization decisions.
memory
Memory estimation utilities for execution planning.
mixed_precision
Mixed precision training utilities.
multimodel
Multi-model coordination for ensemble and multi-task inference.
optimization
Graph optimization and fusion detection utilities.
parallel
Parallel execution utilities with work-stealing scheduler.
perfregression
Performance regression testing framework.
placement
Device placement and multi-device execution coordination.
profiling
Execution profiling and performance monitoring.
profiling_optimizer
Profiling-guided optimization for adaptive performance tuning.
quantization
Advanced quantization support for model compression and acceleration.
recovery
Error recovery and fault tolerance for execution.
rewrite
Graph rewriting engine for pattern-based optimizations.
scheduling
Execution scheduling and optimization for efficient graph execution.
shape
Tensor shape inference and validation.
simd
SIMD (Single Instruction, Multiple Data) optimization utilities.
sparse
Sparse tensor support for TensorLogic.
speculative
Speculative execution for computation graphs.
strategy
Execution strategy configuration and policies.
streaming
Streaming execution support for large graphs and datasets.
tensor_view
Zero-copy tensor views and slicing operations.
typesafe
Type-safe tensor wrappers with compile-time shape checking.
validation
Graph validation utilities for ensuring well-formed execution graphs.
visualization
Visualization utilities for execution analysis and debugging.
workspace
Workspace management for efficient memory reuse.

Structsยง

DummyExecutor
Minimal executor implementation for testing and prototyping.
DummyTensor
Minimal tensor implementation for testing and prototyping.

Enumsยง

ElemOp
ExecutorError
ReduceOp

Traitsยง

TlAutodiff
Automatic differentiation interface.
TlExecutor
Core tensor execution interface.