tensorlogic_infer/lib.rs
1//! Engine-agnostic traits and execution planning API.
2//!
3//! **Version**: 0.1.0-beta.1 | **Status**: Production Ready
4//!
5//! This crate defines the abstract execution interfaces and optimization utilities for TensorLogic:
6
7#![allow(clippy::len_zero)]
8#![allow(clippy::field_reassign_with_default)]
9#![allow(clippy::manual_range_contains)]
10#![allow(clippy::collapsible_if)]
11#![allow(clippy::only_used_in_recursion)]
12#![allow(clippy::needless_range_loop)]
13#![allow(clippy::or_fun_call)]
14#![allow(clippy::derivable_impls)]
15#![allow(clippy::manual_is_multiple_of)]
16#![allow(clippy::overly_complex_bool_expr)]
17#![allow(clippy::unwrap_or_default)]
18//!
19//! ## Core Execution Traits
20//! - **TlExecutor**: Core tensor operations (einsum, element-wise, reductions)
21//! - **TlAutodiff**: Forward/backward pass for automatic differentiation
22//! - **TlEnhancedAutodiff**: Enhanced autodiff with gradient accumulation, clipping, scaling
23//! - **TlBatchExecutor**: Batch execution support
24//! - **TlStreamingExecutor**: Streaming execution for large datasets
25//! - **TlRecoverableExecutor**: Execution with error recovery and checkpointing
26//! - **TlCapabilities**: Backend capability queries
27//! - **TlProfiledExecutor**: Execution profiling
28//! - **TlJitExecutor**: Just-In-Time compilation support
29//! - **TlDistributedExecutor**: Distributed multi-device execution
30//!
31//! ## Optimization Utilities
32//! - **GraphOptimizer**: Fusion detection, dead node elimination, redundancy analysis
33//! - **FusionPlanner**: Planning and validation of fusion transformations
34//! - **Scheduler**: Execution scheduling with multiple strategies (sequential, parallel, cost-based)
35//! - **PlacementOptimizer**: Device placement and multi-device coordination
36//! - **TensorCache**: Result caching with LRU/FIFO/LFU eviction policies
37//! - **MemoryPool**: Tensor memory pooling for allocation reuse
38//! - **ExecutionStrategy**: Complete strategy configuration (mode, precision, memory, parallelism)
39//! - **ExecutionContext**: State management and lifecycle tracking with hooks
40//! - **GraphCompiler**: Ahead-of-time graph compilation with optimization passes
41//! - **CompilationCache**: Caching of compiled graphs to avoid recompilation
42//!
43//! ## JIT Compilation
44//! - **JitCompiler**: Runtime compilation with hot path detection
45//! - **JitCache**: Specialized caching for JIT-compiled graphs
46//! - **HotPathDetector**: Identifies frequently executed code paths
47//! - **AdaptiveOptimizer**: Progressively optimizes based on runtime profiling
48//!
49//! ## Distributed Execution
50//! - **DistributedExecutor**: Multi-device execution coordination
51//! - **DataParallelCoordinator**: Data-parallel training across devices
52//! - **ModelParallelCoordinator**: Model-parallel execution with tensor sharding
53//! - **PipelineParallelCoordinator**: Pipeline parallelism across stages
54//! - **CommunicationBackend**: Abstract interface for device communication
55//!
56//! ## Zero-Copy Operations (Beta.1) ๐
57//! - **TensorView**: Zero-copy tensor views and slicing
58//! - **SliceSpec**: Flexible slicing specifications
59//! - **ViewBuilder**: Ergonomic view construction
60//! - **TensorViewable**: Trait for zero-copy tensor operations
61//!
62//! ## Async Execution (Beta.1) ๐
63//! - **TlAsyncExecutor**: Async/await-based non-blocking execution
64//! - **TlAsyncBatchExecutor**: Asynchronous batch processing
65//! - **TlAsyncStreamExecutor**: Async streaming with backpressure
66//! - **AsyncExecutorPool**: Load-balanced executor pool
67//!
68//! ## Enhanced Diagnostics (Beta.1) ๐
69//! - **Diagnostic**: Rich error messages with suggestions
70//! - **DiagnosticCollector**: Error aggregation and reporting
71//! - **ShapeMismatchDiagnostic**: Helpful shape error messages
72//! - **PerformanceDiagnostic**: Performance issue detection
73//!
74//! ## Analysis and Validation
75//! - **GraphValidator**: Graph validation and diagnostics
76//! - **MemoryEstimator**: Memory usage estimation and lifetime analysis
77//! - **ShapeInferenceContext**: Tensor shape inference for optimization
78//!
79//! ## Debugging Utilities
80//! - **ExecutionTracer**: Record execution flow through computation graphs
81//! - **TensorInspector**: Examine intermediate tensor values and statistics
82//! - **BreakpointManager**: Pause execution at specific nodes for inspection
83//! - **ExecutionRecorder**: Record full execution history for replay and analysis
84//!
85//! ## Visualization Utilities
86//! - **TimelineVisualizer**: ASCII/DOT/JSON timeline visualization
87//! - **GraphVisualizer**: Computation graph visualization
88//! - **TensorStatsVisualizer**: Tensor statistics and histograms
89//! - **ExportFormat**: Export to various formats for external tools
90//!
91//! ## Testing and Development
92//! - **DummyExecutor**: Minimal implementation for testing and prototyping
93//! - **DummyTensor**: Simple tensor representation for tests
94//! - **Backend Tests**: Comprehensive test templates for backend validation
95//! - **Gradient Checking**: Numerical gradient verification utilities
96//!
97//! ## Eager Execution
98//! - **TlEagerAutodiff**: Eager mode automatic differentiation
99//! - **Variable**: Variables with gradient tracking
100//! - **EagerTape**: Dynamic computation graph recording
101//!
102//! ## Advanced Quantization (Beta.1) ๐
103//! - **Quantizer**: Complete quantization pipeline (QAT/PTQ)
104//! - **QuantizationType**: INT8, INT4, INT2, FP8, Binary, Ternary support
105//! - **CalibrationStrategy**: Multiple calibration methods (MinMax, Percentile, MSE, KL-divergence)
106//! - **FakeQuantize**: Quantization simulation for training
107//!
108//! ## Dynamic Batching (Beta.1) ๐
109//! - **DynamicBatcher**: Adaptive request batching with priority queues
110//! - **RequestQueue**: Priority-based queuing (Low/Normal/High/Critical)
111//! - **AdaptiveBatcher**: Automatic batch size optimization
112//! - **BatchingStats**: Comprehensive throughput and latency metrics
113//!
114//! ## Advanced Kernel Fusion (Beta.1) ๐
115//! - **FusionOptimizer**: Pattern-based fusion detection and optimization
116//! - **FusionStrategy**: Conservative/Aggressive/Balanced/Memory-aware modes
117//! - **FusionCostModel**: Memory bandwidth-aware cost modeling
118//! - **FusionPattern**: Common patterns (MatMul+Bias, MatMul+Activation, etc.)
119//!
120//! ## Workspace Management (Beta.1) ๐
121//! - **WorkspacePool**: Memory pool with multiple allocation strategies
122//! - **SharedWorkspacePool**: Thread-safe workspace sharing
123//! - **AllocationStrategy**: BestFit/FirstFit/ExactFit/PowerOfTwo
124//! - **WorkspaceStats**: Efficiency metrics and hit rate tracking
125//!
126//! ## Multi-Model Coordination (Beta.1) ๐
127//! - **MultiModelCoordinator**: Ensemble and multi-model management
128//! - **EnsembleStrategy**: Averaging/Voting/Stacking/Boosting
129//! - **RoutingStrategy**: Priority/Latency/Accuracy-based model selection
130//! - **CascadeConfig**: Early-exit model cascades
131//!
132//! ## Mixed Precision Training (Beta.1) ๐
133//! - **MixedPrecisionConfig**: FP16/BF16/FP8 configuration
134//! - **LossScaler**: Automatic loss scaling with dynamic adjustment
135//! - **PrecisionMode**: Multiple precision modes (FP32/FP16/BF16/FP8/FP64)
136//! - **GradientCheckpoint**: Memory-efficient gradient checkpointing
137//! - **MixedPrecisionState**: Complete training state management
138//!
139//! ## Sparse Tensor Support (Beta.1) ๐
140//! - **SparseTensor**: CSR/CSC/COO sparse formats
141//! - **SparseCSR**: Compressed Sparse Row format
142//! - **SparseCSC**: Compressed Sparse Column format
143//! - **SparseCOO**: Coordinate format for construction
144//! - **Automatic sparsity detection**: Convert dense to sparse when beneficial
145//!
146//! ## Parallel Execution (Beta.1) ๐
147//! - **WorkStealingScheduler**: Dynamic load balancing scheduler
148//! - **Task**: Parallel task with dependencies and priorities
149//! - **StealStrategy**: Multiple work-stealing strategies
150//! - **NumaStrategy**: NUMA-aware memory allocation
151//! - **LoadBalanceStats**: Load balancing metrics
152//!
153//! ## SIMD Optimizations (Beta.1) ๐
154//! - **SimdCapabilities**: Platform detection (AVX2/AVX-512/NEON/SVE)
155//! - **AlignedBuffer**: SIMD-aligned memory allocations
156//! - **SimdInstructionSet**: Instruction set abstractions
157//! - **SimdOptimizationHints**: Compiler optimization hints
158//!
159//! ## Graph Rewriting (Beta.1) ๐
160//! - **RewriteEngine**: Pattern-based graph transformations
161//! - **Pattern**: Flexible pattern matching DSL
162//! - **RewriteRule**: Custom rewrite rules
163//! - **CommonRules**: Standard optimization rules (constant folding, etc.)
164//! - **RewriteStrategy**: Application strategies (exhaustive, fixed-point, etc.)
165//!
166//! ## Profiling-Guided Optimization (Beta.1) ๐
167//! - **ProfilingOptimizer**: Adaptive performance tuning
168//! - **ExecutionProfile**: Runtime performance metrics
169//! - **Hotspot**: Performance bottleneck detection
170//! - **OptimizationGoal**: Optimization objectives (latency, throughput, memory)
171//! - **Auto-tuning**: Automatic configuration selection
172//!
173//! ## Cache Optimization (Beta.1) ๐
174//! - **CacheOptimizer**: Memory hierarchy aware optimization
175//! - **CacheConfig**: L1/L2/L3 cache configuration
176//! - **TilingParams**: Loop tiling for cache efficiency
177//! - **CacheMetrics**: Cache performance estimation
178//! - **DataLayout**: Cache-friendly data arrangements
179//!
180//! ## Automatic Parallelization (Experimental) ๐งช
181//! - **AutoParallelizer**: Automatic detection of parallelism opportunities
182//! - **ParallelizationAnalysis**: Analysis of parallel execution potential
183//! - **ParallelExecutionPlan**: Generated parallel execution plans
184//! - **WorkPartition**: Work distribution across workers
185//! - **Cost modeling**: Estimate execution costs and communication overhead
186//!
187//! ## Speculative Execution (Experimental) ๐งช
188//! - **SpeculativeExecutor**: Branch prediction and speculative execution
189//! - **PredictionStrategy**: Multiple prediction strategies
190//! - **RollbackPolicy**: Handling mispredictions
191//! - **SpeculationStats**: Track speculation success rates
192//! - **Adaptive learning**: Learn from prediction outcomes
193//!
194//! ## Learned Optimizations (Experimental) ๐งช
195//! - **LearnedOptimizer**: ML-based optimization decisions
196//! - **LearningStrategy**: Supervised, reinforcement, online learning
197//! - **CostPrediction**: Learned cost models
198//! - **FusionRecommendation**: ML-based fusion decisions
199//! - **Reinforcement learning**: Q-learning for scheduling
200
201pub mod async_exec;
202pub mod auto_parallel;
203pub mod autodiff;
204pub mod backend_tests;
205pub mod batch;
206pub mod cache;
207pub mod cache_optimizer;
208pub mod capabilities;
209pub mod compilation;
210pub mod context;
211pub mod debug;
212pub mod diagnostics;
213pub mod distributed;
214mod dummy_executor;
215mod dummy_tensor;
216pub mod dynamic_batching;
217pub mod eager;
218mod error;
219pub mod fusion;
220pub mod gradcheck;
221pub mod jit;
222pub mod learned_opt;
223pub mod memory;
224pub mod mixed_precision;
225pub mod multimodel;
226mod ops;
227pub mod optimization;
228pub mod parallel;
229pub mod perfregression;
230pub mod placement;
231pub mod profiling;
232pub mod profiling_optimizer;
233pub mod quantization;
234pub mod recovery;
235pub mod rewrite;
236pub mod scheduling;
237pub mod shape;
238pub mod simd;
239pub mod sparse;
240pub mod speculative;
241pub mod strategy;
242pub mod streaming;
243pub mod tensor_view;
244mod traits;
245pub mod typesafe;
246pub mod validation;
247pub mod visualization;
248pub mod workspace;
249
250#[cfg(test)]
251mod tests;
252
253#[cfg(test)]
254mod validation_tests;
255
256#[cfg(test)]
257mod memory_tests;
258
259#[cfg(feature = "async")]
260pub use async_exec::{
261 AsyncConfig, AsyncExecutionError, AsyncExecutionHandle, AsyncExecutorPool, AsyncStats,
262 AsyncStreamResults, BoxFuture, TlAsyncBatchExecutor, TlAsyncExecutor, TlAsyncStreamExecutor,
263};
264pub use auto_parallel::{
265 AutoParallelError, AutoParallelizer, CostModel as AutoParallelCostModel, DependencyType,
266 NodeId as AutoParallelNodeId, NodeInfo, ParallelExecutionPlan, ParallelStage,
267 ParallelizationAnalysis, ParallelizationStrategy, WorkPartition,
268};
269pub use autodiff::{
270 AccumulationConfig, ClippingStrategy, CustomGradientRegistry, GradientAccumulationStrategy,
271 GradientAccumulator, GradientClipper, GradientConfig, GradientScaler, GradientScaling,
272 GradientStats, TlEnhancedAutodiff,
273};
274pub use backend_tests::{
275 assert_vec_close, print_test_summary, run_all_basic_tests, run_all_performance_tests,
276 test_backend_edge_cases, test_backend_einsum, test_backend_elem_binary,
277 test_backend_elem_unary, test_backend_forward, test_backend_large_tensors,
278 test_backend_memory_efficiency, test_backend_reduce, test_backend_shapes, BackendTestAdapter,
279 TestResult, DEFAULT_TOLERANCE,
280};
281pub use batch::{BatchResult, TlBatchExecutor};
282pub use cache::{CacheKey, CacheStats, EvictionPolicy, MemoryPool, PoolStats, TensorCache};
283pub use cache_optimizer::{
284 AccessPattern, CacheConfig, CacheLevel, CacheMetrics, CacheOptimizer, CacheOptimizerError,
285 DataLayout, OptimizationStats as CacheOptimizationStats, TilingParams,
286};
287pub use capabilities::{BackendCapabilities, DType, DeviceType, Feature, TlCapabilities};
288pub use compilation::{
289 CacheStats as CompilationCacheStats, CompilationCache, CompilationConfig, CompilationKey,
290 CompilationStats, CompiledGraph, GraphCompiler, OptimizationLevel, TlCompilableExecutor,
291};
292pub use context::{ExecutionContext, ExecutionHook, ExecutionPhase, ExecutionState, LoggingHook};
293pub use debug::{
294 Breakpoint, BreakpointHit, BreakpointManager, ExecutionRecorder, ExecutionReport,
295 ExecutionTrace, ExecutionTracer, OperationHandle, TensorInspector, TensorStats,
296 TraceEntry as DebugTraceEntry, TraceSummary,
297};
298pub use diagnostics::{
299 Diagnostic, DiagnosticCollector, MemoryDiagnostic, NodeExecutionDiagnostic,
300 PerformanceDiagnostic, Severity, ShapeMismatchDiagnostic, SourceLocation,
301 TypeMismatchDiagnostic,
302};
303pub use distributed::{
304 CommunicationBackend, CommunicationOp, DataParallelCoordinator, DistributedConfig,
305 DistributedExecutor, DistributedPlacementPlan, DistributedStats, DummyCommunicationBackend,
306 ModelParallelCoordinator, ParallelismStrategy as DistributedParallelismStrategy,
307 PipelineParallelCoordinator, ReductionOp, ShardingSpec, TlDistributedExecutor,
308};
309pub use dummy_executor::DummyExecutor;
310pub use dummy_tensor::DummyTensor;
311pub use dynamic_batching::{
312 AdaptiveBatcher, BatchRequest, BatchingError, BatchingStats, DynamicBatchConfig,
313 DynamicBatcher, Priority, RequestMetadata, RequestQueue,
314};
315pub use eager::{EagerOp, EagerOps, EagerTape, TlEagerAutodiff, Variable, VariableGrad};
316pub use error::ExecutorError;
317pub use fusion::{
318 FusionCandidate, FusionConfig, FusionCostModel, FusionError, FusionOptimizer, FusionPattern,
319 FusionStats, FusionStrategy,
320};
321pub use gradcheck::{
322 compare_gradients, numerical_gradient_central, numerical_gradient_forward, quick_check,
323 GradCheckConfig, GradCheckResult, GradientChecker, GradientError,
324};
325pub use jit::{
326 AdaptiveOptimizationPlan, AdaptiveOptimizer, HotPathDetector, JitCache, JitCacheEntry,
327 JitCacheStats, JitCompiler, JitConfig, JitEntryStats, JitKey, JitStats, SpecializationContext,
328 TlJitExecutor,
329};
330pub use learned_opt::{
331 CostPrediction, FeatureVector, FusionRecommendation, LearnedOptError, LearnedOptimizer,
332 LearningStats, LearningStrategy, ModelType, NodeId as LearnedOptNodeId, OptimizationAction,
333 RewardSignal, ScheduleRecommendation, TrainingExample,
334};
335pub use memory::{MemoryEstimate, MemoryEstimator, TensorMemory};
336pub use mixed_precision::{
337 GradientCheckpoint, LossScaler, LossScalerStats, LossScalingStrategy, MixedPrecisionConfig,
338 MixedPrecisionError, MixedPrecisionState, MixedPrecisionStats, PrecisionMode,
339};
340pub use multimodel::{
341 CascadeConfig, CoordinationStats, EnsembleConfig, EnsembleStrategy, ModelMetadata,
342 MultiModelCoordinator, MultiModelError, ResourceRequirements, RoutingStrategy,
343 TlEnsembleExecutor, TlModelRouter,
344};
345pub use ops::{ElemOp, ReduceOp};
346pub use optimization::{
347 FusionOpportunity, FusionPlanner, FusionType, GraphOptimizer, OptimizationResult,
348};
349pub use parallel::{
350 LoadBalanceStats, NumaNode, NumaStrategy, ParallelConfig, ParallelError, SchedulerStats,
351 StealStrategy, Task, TaskId, TaskPriority, WorkStealingScheduler,
352};
353pub use perfregression::{
354 BenchmarkBaseline, BenchmarkComparison, BenchmarkConfig, BenchmarkStats, PerfRegression,
355 RegressionReport,
356};
357pub use placement::{Device, PlacementOptimizer, PlacementPlan, PlacementStrategy};
358pub use profiling::{
359 Bottleneck, BottleneckAnalyzer, BottleneckReport, PerformanceBaseline, PerformanceComparison,
360 ProfileData, ProfileStatistics, Profiler, ProfilerHook, TimelineProfiler, TlProfiledExecutor,
361 TraceEntry,
362};
363pub use profiling_optimizer::{
364 ExecutionProfile, Hotspot, OptimizationGoal, OptimizationReport, OptimizationStrategy,
365 ProfilingOptimizer, ProfilingOptimizerError, TuningConfig,
366};
367pub use quantization::{
368 CalibrationStats, CalibrationStrategy, FakeQuantize, QuantizationConfig, QuantizationError,
369 QuantizationGranularity, QuantizationMode, QuantizationParams, QuantizationSummary,
370 QuantizationSymmetry, QuantizationType, Quantizer,
371};
372pub use recovery::{
373 Checkpoint, CheckpointManager, DegradationPolicy, FailureInfo, FallbackStrategy,
374 RecoveryConfig, RecoveryMetadata, RecoveryResult, RecoveryStats, RecoveryStrategy, RetryPolicy,
375 TlRecoverableExecutor,
376};
377pub use rewrite::{
378 CommonRules, Match, NodeId as RewriteNodeId, Pattern, ReplacementFn, RewriteEngine,
379 RewriteError, RewriteRule, RewriteStats, RewriteStrategy,
380};
381pub use scheduling::{ExecutionSchedule, NodeCost, Scheduler, SchedulingStrategy};
382pub use shape::{DimSize, ShapeInferenceContext, TensorShape};
383pub use simd::{
384 AlignedBuffer, CpuArchitecture, SimdCapabilities, SimdError, SimdInstructionSet,
385 SimdOptimizationHints,
386};
387pub use sparse::{
388 detect_sparsity, to_sparse_if_beneficial, SparseCOO, SparseCSC, SparseCSR, SparseError,
389 SparseFormat, SparseTensor, SparseTensorBuilder,
390};
391pub use speculative::{
392 BranchOutcome, NodeId as SpeculativeNodeId, PredictionStrategy, RollbackPolicy,
393 SpeculationStats, SpeculativeError, SpeculativeExecutor, SpeculativeTask,
394};
395pub use strategy::{
396 ExecutionMode, ExecutionStrategy, GradientStrategy, MemoryStrategy, ParallelismStrategy,
397 StrategyOptimizer,
398};
399pub use streaming::{
400 ChunkIterator, ChunkMetadata, StreamProcessor, StreamResult, StreamingConfig, StreamingMode,
401 TlStreamingExecutor,
402};
403pub use tensor_view::{
404 InPlaceMode, InPlaceOps, SliceSpec, TensorView, TensorViewable, ViewBuilder,
405};
406pub use traits::{TlAutodiff, TlExecutor};
407pub use typesafe::{
408 BroadcastShape, Dim, DimMul, DimOp, DimSize as TypesafeDimSize, Dyn, EinsumSpec, FixedShape,
409 Matrix, MatrixOps, Nat, Scalar, ShapeConstraint, ShapedTensor, Static, Tensor3D, Tensor4D,
410 TensorBuilder, TypedBatch, TypedInputs, TypedOutputs, TypedTensor, TypedTensorOps, Vector, D1,
411 D2, D3, D4, D5, D6, S, Z,
412};
413pub use validation::{GraphValidator, ValidationResult};
414pub use visualization::{
415 ExportFormat, GraphConfig, GraphVisualizer, TensorStatsVisualizer, TimelineConfig,
416 TimelineVisualizer, VisualizationFormat,
417};
418pub use workspace::{
419 AllocationStrategy, DefragmentationResult, SharedWorkspacePool, Workspace, WorkspaceConfig,
420 WorkspaceError, WorkspacePool, WorkspaceStats,
421};