Expand description
Engine-agnostic traits and execution planning API.
Version: 0.1.0-beta.1 | Status: Production Ready
This crate defines the abstract execution interfaces and optimization utilities for TensorLogic:
ยงCore Execution Traits
- TlExecutor: Core tensor operations (einsum, element-wise, reductions)
- TlAutodiff: Forward/backward pass for automatic differentiation
- TlEnhancedAutodiff: Enhanced autodiff with gradient accumulation, clipping, scaling
- TlBatchExecutor: Batch execution support
- TlStreamingExecutor: Streaming execution for large datasets
- TlRecoverableExecutor: Execution with error recovery and checkpointing
- TlCapabilities: Backend capability queries
- TlProfiledExecutor: Execution profiling
- TlJitExecutor: Just-In-Time compilation support
- TlDistributedExecutor: Distributed multi-device execution
ยงOptimization Utilities
- GraphOptimizer: Fusion detection, dead node elimination, redundancy analysis
- FusionPlanner: Planning and validation of fusion transformations
- Scheduler: Execution scheduling with multiple strategies (sequential, parallel, cost-based)
- PlacementOptimizer: Device placement and multi-device coordination
- TensorCache: Result caching with LRU/FIFO/LFU eviction policies
- MemoryPool: Tensor memory pooling for allocation reuse
- ExecutionStrategy: Complete strategy configuration (mode, precision, memory, parallelism)
- ExecutionContext: State management and lifecycle tracking with hooks
- GraphCompiler: Ahead-of-time graph compilation with optimization passes
- CompilationCache: Caching of compiled graphs to avoid recompilation
ยงJIT Compilation
- JitCompiler: Runtime compilation with hot path detection
- JitCache: Specialized caching for JIT-compiled graphs
- HotPathDetector: Identifies frequently executed code paths
- AdaptiveOptimizer: Progressively optimizes based on runtime profiling
ยงDistributed Execution
- DistributedExecutor: Multi-device execution coordination
- DataParallelCoordinator: Data-parallel training across devices
- ModelParallelCoordinator: Model-parallel execution with tensor sharding
- PipelineParallelCoordinator: Pipeline parallelism across stages
- CommunicationBackend: Abstract interface for device communication
ยงZero-Copy Operations (Beta.1) ๐
- TensorView: Zero-copy tensor views and slicing
- SliceSpec: Flexible slicing specifications
- ViewBuilder: Ergonomic view construction
- TensorViewable: Trait for zero-copy tensor operations
ยงAsync Execution (Beta.1) ๐
- TlAsyncExecutor: Async/await-based non-blocking execution
- TlAsyncBatchExecutor: Asynchronous batch processing
- TlAsyncStreamExecutor: Async streaming with backpressure
- AsyncExecutorPool: Load-balanced executor pool
ยงEnhanced Diagnostics (Beta.1) ๐
- Diagnostic: Rich error messages with suggestions
- DiagnosticCollector: Error aggregation and reporting
- ShapeMismatchDiagnostic: Helpful shape error messages
- PerformanceDiagnostic: Performance issue detection
ยงAnalysis and Validation
- GraphValidator: Graph validation and diagnostics
- MemoryEstimator: Memory usage estimation and lifetime analysis
- ShapeInferenceContext: Tensor shape inference for optimization
ยงDebugging Utilities
- ExecutionTracer: Record execution flow through computation graphs
- TensorInspector: Examine intermediate tensor values and statistics
- BreakpointManager: Pause execution at specific nodes for inspection
- ExecutionRecorder: Record full execution history for replay and analysis
ยงVisualization Utilities
- TimelineVisualizer: ASCII/DOT/JSON timeline visualization
- GraphVisualizer: Computation graph visualization
- TensorStatsVisualizer: Tensor statistics and histograms
- ExportFormat: Export to various formats for external tools
ยงTesting and Development
- DummyExecutor: Minimal implementation for testing and prototyping
- DummyTensor: Simple tensor representation for tests
- Backend Tests: Comprehensive test templates for backend validation
- Gradient Checking: Numerical gradient verification utilities
ยงEager Execution
- TlEagerAutodiff: Eager mode automatic differentiation
- Variable: Variables with gradient tracking
- EagerTape: Dynamic computation graph recording
ยงAdvanced Quantization (Beta.1) ๐
- Quantizer: Complete quantization pipeline (QAT/PTQ)
- QuantizationType: INT8, INT4, INT2, FP8, Binary, Ternary support
- CalibrationStrategy: Multiple calibration methods (MinMax, Percentile, MSE, KL-divergence)
- FakeQuantize: Quantization simulation for training
ยงDynamic Batching (Beta.1) ๐
- DynamicBatcher: Adaptive request batching with priority queues
- RequestQueue: Priority-based queuing (Low/Normal/High/Critical)
- AdaptiveBatcher: Automatic batch size optimization
- BatchingStats: Comprehensive throughput and latency metrics
ยงAdvanced Kernel Fusion (Beta.1) ๐
- FusionOptimizer: Pattern-based fusion detection and optimization
- FusionStrategy: Conservative/Aggressive/Balanced/Memory-aware modes
- FusionCostModel: Memory bandwidth-aware cost modeling
- FusionPattern: Common patterns (MatMul+Bias, MatMul+Activation, etc.)
ยงWorkspace Management (Beta.1) ๐
- WorkspacePool: Memory pool with multiple allocation strategies
- SharedWorkspacePool: Thread-safe workspace sharing
- AllocationStrategy: BestFit/FirstFit/ExactFit/PowerOfTwo
- WorkspaceStats: Efficiency metrics and hit rate tracking
ยงMulti-Model Coordination (Beta.1) ๐
- MultiModelCoordinator: Ensemble and multi-model management
- EnsembleStrategy: Averaging/Voting/Stacking/Boosting
- RoutingStrategy: Priority/Latency/Accuracy-based model selection
- CascadeConfig: Early-exit model cascades
ยงMixed Precision Training (Beta.1) ๐
- MixedPrecisionConfig: FP16/BF16/FP8 configuration
- LossScaler: Automatic loss scaling with dynamic adjustment
- PrecisionMode: Multiple precision modes (FP32/FP16/BF16/FP8/FP64)
- GradientCheckpoint: Memory-efficient gradient checkpointing
- MixedPrecisionState: Complete training state management
ยงSparse Tensor Support (Beta.1) ๐
- SparseTensor: CSR/CSC/COO sparse formats
- SparseCSR: Compressed Sparse Row format
- SparseCSC: Compressed Sparse Column format
- SparseCOO: Coordinate format for construction
- Automatic sparsity detection: Convert dense to sparse when beneficial
ยงParallel Execution (Beta.1) ๐
- WorkStealingScheduler: Dynamic load balancing scheduler
- Task: Parallel task with dependencies and priorities
- StealStrategy: Multiple work-stealing strategies
- NumaStrategy: NUMA-aware memory allocation
- LoadBalanceStats: Load balancing metrics
ยงSIMD Optimizations (Beta.1) ๐
- SimdCapabilities: Platform detection (AVX2/AVX-512/NEON/SVE)
- AlignedBuffer: SIMD-aligned memory allocations
- SimdInstructionSet: Instruction set abstractions
- SimdOptimizationHints: Compiler optimization hints
ยงGraph Rewriting (Beta.1) ๐
- RewriteEngine: Pattern-based graph transformations
- Pattern: Flexible pattern matching DSL
- RewriteRule: Custom rewrite rules
- CommonRules: Standard optimization rules (constant folding, etc.)
- RewriteStrategy: Application strategies (exhaustive, fixed-point, etc.)
ยงProfiling-Guided Optimization (Beta.1) ๐
- ProfilingOptimizer: Adaptive performance tuning
- ExecutionProfile: Runtime performance metrics
- Hotspot: Performance bottleneck detection
- OptimizationGoal: Optimization objectives (latency, throughput, memory)
- Auto-tuning: Automatic configuration selection
ยงCache Optimization (Beta.1) ๐
- CacheOptimizer: Memory hierarchy aware optimization
- CacheConfig: L1/L2/L3 cache configuration
- TilingParams: Loop tiling for cache efficiency
- CacheMetrics: Cache performance estimation
- DataLayout: Cache-friendly data arrangements
ยงAutomatic Parallelization (Experimental) ๐งช
- AutoParallelizer: Automatic detection of parallelism opportunities
- ParallelizationAnalysis: Analysis of parallel execution potential
- ParallelExecutionPlan: Generated parallel execution plans
- WorkPartition: Work distribution across workers
- Cost modeling: Estimate execution costs and communication overhead
ยงSpeculative Execution (Experimental) ๐งช
- SpeculativeExecutor: Branch prediction and speculative execution
- PredictionStrategy: Multiple prediction strategies
- RollbackPolicy: Handling mispredictions
- SpeculationStats: Track speculation success rates
- Adaptive learning: Learn from prediction outcomes
ยงLearned Optimizations (Experimental) ๐งช
- LearnedOptimizer: ML-based optimization decisions
- LearningStrategy: Supervised, reinforcement, online learning
- CostPrediction: Learned cost models
- FusionRecommendation: ML-based fusion decisions
- Reinforcement learning: Q-learning for scheduling
Re-exportsยง
pub use auto_parallel::AutoParallelError;pub use auto_parallel::AutoParallelizer;pub use auto_parallel::CostModel as AutoParallelCostModel;pub use auto_parallel::DependencyType;pub use auto_parallel::NodeId as AutoParallelNodeId;pub use auto_parallel::NodeInfo;pub use auto_parallel::ParallelExecutionPlan;pub use auto_parallel::ParallelStage;pub use auto_parallel::ParallelizationAnalysis;pub use auto_parallel::ParallelizationStrategy;pub use auto_parallel::WorkPartition;pub use autodiff::AccumulationConfig;pub use autodiff::ClippingStrategy;pub use autodiff::CustomGradientRegistry;pub use autodiff::GradientAccumulationStrategy;pub use autodiff::GradientAccumulator;pub use autodiff::GradientClipper;pub use autodiff::GradientConfig;pub use autodiff::GradientScaler;pub use autodiff::GradientScaling;pub use autodiff::GradientStats;pub use autodiff::TlEnhancedAutodiff;pub use backend_tests::assert_vec_close;pub use backend_tests::print_test_summary;pub use backend_tests::run_all_basic_tests;pub use backend_tests::run_all_performance_tests;pub use backend_tests::test_backend_edge_cases;pub use backend_tests::test_backend_einsum;pub use backend_tests::test_backend_elem_binary;pub use backend_tests::test_backend_elem_unary;pub use backend_tests::test_backend_forward;pub use backend_tests::test_backend_large_tensors;pub use backend_tests::test_backend_memory_efficiency;pub use backend_tests::test_backend_reduce;pub use backend_tests::test_backend_shapes;pub use backend_tests::BackendTestAdapter;pub use backend_tests::TestResult;pub use backend_tests::DEFAULT_TOLERANCE;pub use batch::BatchResult;pub use batch::TlBatchExecutor;pub use cache::CacheKey;pub use cache::CacheStats;pub use cache::EvictionPolicy;pub use cache::MemoryPool;pub use cache::PoolStats;pub use cache::TensorCache;pub use cache_optimizer::AccessPattern;pub use cache_optimizer::CacheConfig;pub use cache_optimizer::CacheLevel;pub use cache_optimizer::CacheMetrics;pub use cache_optimizer::CacheOptimizer;pub use cache_optimizer::CacheOptimizerError;pub use cache_optimizer::DataLayout;pub use cache_optimizer::OptimizationStats as CacheOptimizationStats;pub use cache_optimizer::TilingParams;pub use capabilities::BackendCapabilities;pub use capabilities::DType;pub use capabilities::DeviceType;pub use capabilities::Feature;pub use capabilities::TlCapabilities;pub use compilation::CacheStats as CompilationCacheStats;pub use compilation::CompilationCache;pub use compilation::CompilationConfig;pub use compilation::CompilationKey;pub use compilation::CompilationStats;pub use compilation::CompiledGraph;pub use compilation::GraphCompiler;pub use compilation::OptimizationLevel;pub use compilation::TlCompilableExecutor;pub use context::ExecutionContext;pub use context::ExecutionHook;pub use context::ExecutionPhase;pub use context::ExecutionState;pub use context::LoggingHook;pub use debug::Breakpoint;pub use debug::BreakpointHit;pub use debug::BreakpointManager;pub use debug::ExecutionRecorder;pub use debug::ExecutionReport;pub use debug::ExecutionTrace;pub use debug::ExecutionTracer;pub use debug::OperationHandle;pub use debug::TensorInspector;pub use debug::TensorStats;pub use debug::TraceEntry as DebugTraceEntry;pub use debug::TraceSummary;pub use diagnostics::Diagnostic;pub use diagnostics::DiagnosticCollector;pub use diagnostics::MemoryDiagnostic;pub use diagnostics::NodeExecutionDiagnostic;pub use diagnostics::PerformanceDiagnostic;pub use diagnostics::Severity;pub use diagnostics::ShapeMismatchDiagnostic;pub use diagnostics::SourceLocation;pub use diagnostics::TypeMismatchDiagnostic;pub use distributed::CommunicationBackend;pub use distributed::CommunicationOp;pub use distributed::DataParallelCoordinator;pub use distributed::DistributedConfig;pub use distributed::DistributedExecutor;pub use distributed::DistributedPlacementPlan;pub use distributed::DistributedStats;pub use distributed::DummyCommunicationBackend;pub use distributed::ModelParallelCoordinator;pub use distributed::ParallelismStrategy as DistributedParallelismStrategy;pub use distributed::PipelineParallelCoordinator;pub use distributed::ReductionOp;pub use distributed::ShardingSpec;pub use distributed::TlDistributedExecutor;pub use dynamic_batching::AdaptiveBatcher;pub use dynamic_batching::BatchRequest;pub use dynamic_batching::BatchingError;pub use dynamic_batching::BatchingStats;pub use dynamic_batching::DynamicBatchConfig;pub use dynamic_batching::DynamicBatcher;pub use dynamic_batching::Priority;pub use dynamic_batching::RequestMetadata;pub use dynamic_batching::RequestQueue;pub use eager::EagerOp;pub use eager::EagerOps;pub use eager::EagerTape;pub use eager::TlEagerAutodiff;pub use eager::Variable;pub use eager::VariableGrad;pub use fusion::FusionCandidate;pub use fusion::FusionConfig;pub use fusion::FusionCostModel;pub use fusion::FusionError;pub use fusion::FusionOptimizer;pub use fusion::FusionPattern;pub use fusion::FusionStats;pub use fusion::FusionStrategy;pub use gradcheck::compare_gradients;pub use gradcheck::numerical_gradient_central;pub use gradcheck::numerical_gradient_forward;pub use gradcheck::quick_check;pub use gradcheck::GradCheckConfig;pub use gradcheck::GradCheckResult;pub use gradcheck::GradientChecker;pub use gradcheck::GradientError;pub use jit::AdaptiveOptimizationPlan;pub use jit::AdaptiveOptimizer;pub use jit::HotPathDetector;pub use jit::JitCache;pub use jit::JitCacheEntry;pub use jit::JitCacheStats;pub use jit::JitCompiler;pub use jit::JitConfig;pub use jit::JitEntryStats;pub use jit::JitKey;pub use jit::JitStats;pub use jit::SpecializationContext;pub use jit::TlJitExecutor;pub use learned_opt::CostPrediction;pub use learned_opt::FeatureVector;pub use learned_opt::FusionRecommendation;pub use learned_opt::LearnedOptError;pub use learned_opt::LearnedOptimizer;pub use learned_opt::LearningStats;pub use learned_opt::LearningStrategy;pub use learned_opt::ModelType;pub use learned_opt::NodeId as LearnedOptNodeId;pub use learned_opt::OptimizationAction;pub use learned_opt::RewardSignal;pub use learned_opt::ScheduleRecommendation;pub use learned_opt::TrainingExample;pub use memory::MemoryEstimate;pub use memory::MemoryEstimator;pub use memory::TensorMemory;pub use mixed_precision::GradientCheckpoint;pub use mixed_precision::LossScaler;pub use mixed_precision::LossScalerStats;pub use mixed_precision::LossScalingStrategy;pub use mixed_precision::MixedPrecisionConfig;pub use mixed_precision::MixedPrecisionError;pub use mixed_precision::MixedPrecisionState;pub use mixed_precision::MixedPrecisionStats;pub use mixed_precision::PrecisionMode;pub use multimodel::CascadeConfig;pub use multimodel::CoordinationStats;pub use multimodel::EnsembleConfig;pub use multimodel::EnsembleStrategy;pub use multimodel::ModelMetadata;pub use multimodel::MultiModelCoordinator;pub use multimodel::MultiModelError;pub use multimodel::ResourceRequirements;pub use multimodel::RoutingStrategy;pub use multimodel::TlEnsembleExecutor;pub use multimodel::TlModelRouter;pub use optimization::FusionOpportunity;pub use optimization::FusionPlanner;pub use optimization::FusionType;pub use optimization::GraphOptimizer;pub use optimization::OptimizationResult;pub use parallel::LoadBalanceStats;pub use parallel::NumaNode;pub use parallel::NumaStrategy;pub use parallel::ParallelConfig;pub use parallel::ParallelError;pub use parallel::SchedulerStats;pub use parallel::StealStrategy;pub use parallel::Task;pub use parallel::TaskId;pub use parallel::TaskPriority;pub use parallel::WorkStealingScheduler;pub use perfregression::BenchmarkBaseline;pub use perfregression::BenchmarkComparison;pub use perfregression::BenchmarkConfig;pub use perfregression::BenchmarkStats;pub use perfregression::PerfRegression;pub use perfregression::RegressionReport;pub use placement::Device;pub use placement::PlacementOptimizer;pub use placement::PlacementPlan;pub use placement::PlacementStrategy;pub use profiling::Bottleneck;pub use profiling::BottleneckAnalyzer;pub use profiling::BottleneckReport;pub use profiling::PerformanceBaseline;pub use profiling::PerformanceComparison;pub use profiling::ProfileData;pub use profiling::ProfileStatistics;pub use profiling::Profiler;pub use profiling::ProfilerHook;pub use profiling::TimelineProfiler;pub use profiling::TlProfiledExecutor;pub use profiling::TraceEntry;pub use profiling_optimizer::ExecutionProfile;pub use profiling_optimizer::Hotspot;pub use profiling_optimizer::OptimizationGoal;pub use profiling_optimizer::OptimizationReport;pub use profiling_optimizer::OptimizationStrategy;pub use profiling_optimizer::ProfilingOptimizer;pub use profiling_optimizer::ProfilingOptimizerError;pub use profiling_optimizer::TuningConfig;pub use quantization::CalibrationStats;pub use quantization::CalibrationStrategy;pub use quantization::FakeQuantize;pub use quantization::QuantizationConfig;pub use quantization::QuantizationError;pub use quantization::QuantizationGranularity;pub use quantization::QuantizationMode;pub use quantization::QuantizationParams;pub use quantization::QuantizationSummary;pub use quantization::QuantizationSymmetry;pub use quantization::QuantizationType;pub use quantization::Quantizer;pub use recovery::Checkpoint;pub use recovery::CheckpointManager;pub use recovery::DegradationPolicy;pub use recovery::FailureInfo;pub use recovery::FallbackStrategy;pub use recovery::RecoveryConfig;pub use recovery::RecoveryMetadata;pub use recovery::RecoveryResult;pub use recovery::RecoveryStats;pub use recovery::RecoveryStrategy;pub use recovery::RetryPolicy;pub use recovery::TlRecoverableExecutor;pub use rewrite::CommonRules;pub use rewrite::Match;pub use rewrite::NodeId as RewriteNodeId;pub use rewrite::Pattern;pub use rewrite::ReplacementFn;pub use rewrite::RewriteEngine;pub use rewrite::RewriteError;pub use rewrite::RewriteRule;pub use rewrite::RewriteStats;pub use rewrite::RewriteStrategy;pub use scheduling::ExecutionSchedule;pub use scheduling::NodeCost;pub use scheduling::Scheduler;pub use scheduling::SchedulingStrategy;pub use shape::DimSize;pub use shape::ShapeInferenceContext;pub use shape::TensorShape;pub use simd::AlignedBuffer;pub use simd::CpuArchitecture;pub use simd::SimdCapabilities;pub use simd::SimdError;pub use simd::SimdInstructionSet;pub use simd::SimdOptimizationHints;pub use sparse::detect_sparsity;pub use sparse::to_sparse_if_beneficial;pub use sparse::SparseCOO;pub use sparse::SparseCSC;pub use sparse::SparseCSR;pub use sparse::SparseError;pub use sparse::SparseFormat;pub use sparse::SparseTensor;pub use sparse::SparseTensorBuilder;pub use speculative::BranchOutcome;pub use speculative::NodeId as SpeculativeNodeId;pub use speculative::PredictionStrategy;pub use speculative::RollbackPolicy;pub use speculative::SpeculationStats;pub use speculative::SpeculativeError;pub use speculative::SpeculativeExecutor;pub use speculative::SpeculativeTask;pub use strategy::ExecutionMode;pub use strategy::ExecutionStrategy;pub use strategy::GradientStrategy;pub use strategy::MemoryStrategy;pub use strategy::ParallelismStrategy;pub use strategy::StrategyOptimizer;pub use streaming::ChunkIterator;pub use streaming::ChunkMetadata;pub use streaming::StreamProcessor;pub use streaming::StreamResult;pub use streaming::StreamingConfig;pub use streaming::StreamingMode;pub use streaming::TlStreamingExecutor;pub use tensor_view::InPlaceMode;pub use tensor_view::InPlaceOps;pub use tensor_view::SliceSpec;pub use tensor_view::TensorView;pub use tensor_view::TensorViewable;pub use tensor_view::ViewBuilder;pub use typesafe::BroadcastShape;pub use typesafe::Dim;pub use typesafe::DimMul;pub use typesafe::DimOp;pub use typesafe::DimSize as TypesafeDimSize;pub use typesafe::Dyn;pub use typesafe::EinsumSpec;pub use typesafe::FixedShape;pub use typesafe::Matrix;pub use typesafe::MatrixOps;pub use typesafe::Nat;pub use typesafe::Scalar;pub use typesafe::ShapeConstraint;pub use typesafe::ShapedTensor;pub use typesafe::Static;pub use typesafe::Tensor3D;pub use typesafe::Tensor4D;pub use typesafe::TensorBuilder;pub use typesafe::TypedBatch;pub use typesafe::TypedInputs;pub use typesafe::TypedOutputs;pub use typesafe::TypedTensor;pub use typesafe::TypedTensorOps;pub use typesafe::Vector;pub use typesafe::D1;pub use typesafe::D2;pub use typesafe::D3;pub use typesafe::D4;pub use typesafe::D5;pub use typesafe::D6;pub use typesafe::S;pub use typesafe::Z;pub use validation::GraphValidator;pub use validation::ValidationResult;pub use visualization::ExportFormat;pub use visualization::GraphConfig;pub use visualization::GraphVisualizer;pub use visualization::TensorStatsVisualizer;pub use visualization::TimelineConfig;pub use visualization::TimelineVisualizer;pub use visualization::VisualizationFormat;pub use workspace::AllocationStrategy;pub use workspace::DefragmentationResult;pub use workspace::Workspace;pub use workspace::WorkspaceConfig;pub use workspace::WorkspaceError;pub use workspace::WorkspacePool;pub use workspace::WorkspaceStats;
Modulesยง
- async_
exec - Asynchronous execution traits for concurrent tensor operations.
- auto_
parallel - Automatic parallelization for computation graphs.
- autodiff
- Autodiff enhancements for training and optimization.
- backend_
tests - Backend compatibility test templates.
- batch
- Batch execution support for processing multiple inputs efficiently.
- cache
- Tensor caching and memory pooling for efficient execution.
- cache_
optimizer - Memory hierarchy and cache-aware optimization.
- capabilities
- Backend capability queries and feature detection.
- compilation
- Graph compilation and caching infrastructure.
- context
- Execution context and state management for coordinated execution.
- debug
- Debugging utilities for execution tracing and tensor inspection.
- diagnostics
- Enhanced error diagnostics with helpful suggestions.
- distributed
- Distributed execution infrastructure for multi-device and multi-node computation.
- dynamic_
batching - Dynamic batching for inference serving.
- eager
- Eager mode automatic differentiation.
- fusion
- Advanced kernel fusion for optimized execution.
- gradcheck
- Gradient checking utilities for validating autodiff implementations.
- jit
- Just-In-Time (JIT) compilation infrastructure.
- learned_
opt - Machine learning-based optimization decisions.
- memory
- Memory estimation utilities for execution planning.
- mixed_
precision - Mixed precision training utilities.
- multimodel
- Multi-model coordination for ensemble and multi-task inference.
- optimization
- Graph optimization and fusion detection utilities.
- parallel
- Parallel execution utilities with work-stealing scheduler.
- perfregression
- Performance regression testing framework.
- placement
- Device placement and multi-device execution coordination.
- profiling
- Execution profiling and performance monitoring.
- profiling_
optimizer - Profiling-guided optimization for adaptive performance tuning.
- quantization
- Advanced quantization support for model compression and acceleration.
- recovery
- Error recovery and fault tolerance for execution.
- rewrite
- Graph rewriting engine for pattern-based optimizations.
- scheduling
- Execution scheduling and optimization for efficient graph execution.
- shape
- Tensor shape inference and validation.
- simd
- SIMD (Single Instruction, Multiple Data) optimization utilities.
- sparse
- Sparse tensor support for TensorLogic.
- speculative
- Speculative execution for computation graphs.
- strategy
- Execution strategy configuration and policies.
- streaming
- Streaming execution support for large graphs and datasets.
- tensor_
view - Zero-copy tensor views and slicing operations.
- typesafe
- Type-safe tensor wrappers with compile-time shape checking.
- validation
- Graph validation utilities for ensuring well-formed execution graphs.
- visualization
- Visualization utilities for execution analysis and debugging.
- workspace
- Workspace management for efficient memory reuse.
Structsยง
- Dummy
Executor - Minimal executor implementation for testing and prototyping.
- Dummy
Tensor - Minimal tensor implementation for testing and prototyping.
Enumsยง
Traitsยง
- TlAutodiff
- Automatic differentiation interface.
- TlExecutor
- Core tensor execution interface.