Skip to main content

tensorlogic_infer/
lib.rs

1//! Engine-agnostic traits and execution planning API.
2//!
3//! **Version**: 0.1.0-beta.1 | **Status**: Production Ready
4//!
5//! This crate defines the abstract execution interfaces and optimization utilities for TensorLogic:
6
7#![allow(clippy::len_zero)]
8#![allow(clippy::field_reassign_with_default)]
9#![allow(clippy::manual_range_contains)]
10#![allow(clippy::collapsible_if)]
11#![allow(clippy::only_used_in_recursion)]
12#![allow(clippy::needless_range_loop)]
13#![allow(clippy::or_fun_call)]
14#![allow(clippy::derivable_impls)]
15#![allow(clippy::manual_is_multiple_of)]
16#![allow(clippy::overly_complex_bool_expr)]
17#![allow(clippy::unwrap_or_default)]
18//!
19//! ## Core Execution Traits
20//! - **TlExecutor**: Core tensor operations (einsum, element-wise, reductions)
21//! - **TlAutodiff**: Forward/backward pass for automatic differentiation
22//! - **TlEnhancedAutodiff**: Enhanced autodiff with gradient accumulation, clipping, scaling
23//! - **TlBatchExecutor**: Batch execution support
24//! - **TlStreamingExecutor**: Streaming execution for large datasets
25//! - **TlRecoverableExecutor**: Execution with error recovery and checkpointing
26//! - **TlCapabilities**: Backend capability queries
27//! - **TlProfiledExecutor**: Execution profiling
28//! - **TlJitExecutor**: Just-In-Time compilation support
29//! - **TlDistributedExecutor**: Distributed multi-device execution
30//!
31//! ## Optimization Utilities
32//! - **GraphOptimizer**: Fusion detection, dead node elimination, redundancy analysis
33//! - **FusionPlanner**: Planning and validation of fusion transformations
34//! - **Scheduler**: Execution scheduling with multiple strategies (sequential, parallel, cost-based)
35//! - **PlacementOptimizer**: Device placement and multi-device coordination
36//! - **TensorCache**: Result caching with LRU/FIFO/LFU eviction policies
37//! - **MemoryPool**: Tensor memory pooling for allocation reuse
38//! - **ExecutionStrategy**: Complete strategy configuration (mode, precision, memory, parallelism)
39//! - **ExecutionContext**: State management and lifecycle tracking with hooks
40//! - **GraphCompiler**: Ahead-of-time graph compilation with optimization passes
41//! - **CompilationCache**: Caching of compiled graphs to avoid recompilation
42//!
43//! ## JIT Compilation
44//! - **JitCompiler**: Runtime compilation with hot path detection
45//! - **JitCache**: Specialized caching for JIT-compiled graphs
46//! - **HotPathDetector**: Identifies frequently executed code paths
47//! - **AdaptiveOptimizer**: Progressively optimizes based on runtime profiling
48//!
49//! ## Distributed Execution
50//! - **DistributedExecutor**: Multi-device execution coordination
51//! - **DataParallelCoordinator**: Data-parallel training across devices
52//! - **ModelParallelCoordinator**: Model-parallel execution with tensor sharding
53//! - **PipelineParallelCoordinator**: Pipeline parallelism across stages
54//! - **CommunicationBackend**: Abstract interface for device communication
55//!
56//! ## Zero-Copy Operations (Beta.1) ๐Ÿ†•
57//! - **TensorView**: Zero-copy tensor views and slicing
58//! - **SliceSpec**: Flexible slicing specifications
59//! - **ViewBuilder**: Ergonomic view construction
60//! - **TensorViewable**: Trait for zero-copy tensor operations
61//!
62//! ## Async Execution (Beta.1) ๐Ÿ†•
63//! - **TlAsyncExecutor**: Async/await-based non-blocking execution
64//! - **TlAsyncBatchExecutor**: Asynchronous batch processing
65//! - **TlAsyncStreamExecutor**: Async streaming with backpressure
66//! - **AsyncExecutorPool**: Load-balanced executor pool
67//!
68//! ## Enhanced Diagnostics (Beta.1) ๐Ÿ†•
69//! - **Diagnostic**: Rich error messages with suggestions
70//! - **DiagnosticCollector**: Error aggregation and reporting
71//! - **ShapeMismatchDiagnostic**: Helpful shape error messages
72//! - **PerformanceDiagnostic**: Performance issue detection
73//!
74//! ## Analysis and Validation
75//! - **GraphValidator**: Graph validation and diagnostics
76//! - **MemoryEstimator**: Memory usage estimation and lifetime analysis
77//! - **ShapeInferenceContext**: Tensor shape inference for optimization
78//!
79//! ## Debugging Utilities
80//! - **ExecutionTracer**: Record execution flow through computation graphs
81//! - **TensorInspector**: Examine intermediate tensor values and statistics
82//! - **BreakpointManager**: Pause execution at specific nodes for inspection
83//! - **ExecutionRecorder**: Record full execution history for replay and analysis
84//!
85//! ## Visualization Utilities
86//! - **TimelineVisualizer**: ASCII/DOT/JSON timeline visualization
87//! - **GraphVisualizer**: Computation graph visualization
88//! - **TensorStatsVisualizer**: Tensor statistics and histograms
89//! - **ExportFormat**: Export to various formats for external tools
90//!
91//! ## Testing and Development
92//! - **DummyExecutor**: Minimal implementation for testing and prototyping
93//! - **DummyTensor**: Simple tensor representation for tests
94//! - **Backend Tests**: Comprehensive test templates for backend validation
95//! - **Gradient Checking**: Numerical gradient verification utilities
96//!
97//! ## Eager Execution
98//! - **TlEagerAutodiff**: Eager mode automatic differentiation
99//! - **Variable**: Variables with gradient tracking
100//! - **EagerTape**: Dynamic computation graph recording
101//!
102//! ## Advanced Quantization (Beta.1) ๐Ÿ†•
103//! - **Quantizer**: Complete quantization pipeline (QAT/PTQ)
104//! - **QuantizationType**: INT8, INT4, INT2, FP8, Binary, Ternary support
105//! - **CalibrationStrategy**: Multiple calibration methods (MinMax, Percentile, MSE, KL-divergence)
106//! - **FakeQuantize**: Quantization simulation for training
107//!
108//! ## Dynamic Batching (Beta.1) ๐Ÿ†•
109//! - **DynamicBatcher**: Adaptive request batching with priority queues
110//! - **RequestQueue**: Priority-based queuing (Low/Normal/High/Critical)
111//! - **AdaptiveBatcher**: Automatic batch size optimization
112//! - **BatchingStats**: Comprehensive throughput and latency metrics
113//!
114//! ## Advanced Kernel Fusion (Beta.1) ๐Ÿ†•
115//! - **FusionOptimizer**: Pattern-based fusion detection and optimization
116//! - **FusionStrategy**: Conservative/Aggressive/Balanced/Memory-aware modes
117//! - **FusionCostModel**: Memory bandwidth-aware cost modeling
118//! - **FusionPattern**: Common patterns (MatMul+Bias, MatMul+Activation, etc.)
119//!
120//! ## Workspace Management (Beta.1) ๐Ÿ†•
121//! - **WorkspacePool**: Memory pool with multiple allocation strategies
122//! - **SharedWorkspacePool**: Thread-safe workspace sharing
123//! - **AllocationStrategy**: BestFit/FirstFit/ExactFit/PowerOfTwo
124//! - **WorkspaceStats**: Efficiency metrics and hit rate tracking
125//!
126//! ## Multi-Model Coordination (Beta.1) ๐Ÿ†•
127//! - **MultiModelCoordinator**: Ensemble and multi-model management
128//! - **EnsembleStrategy**: Averaging/Voting/Stacking/Boosting
129//! - **RoutingStrategy**: Priority/Latency/Accuracy-based model selection
130//! - **CascadeConfig**: Early-exit model cascades
131//!
132//! ## Mixed Precision Training (Beta.1) ๐Ÿ†•
133//! - **MixedPrecisionConfig**: FP16/BF16/FP8 configuration
134//! - **LossScaler**: Automatic loss scaling with dynamic adjustment
135//! - **PrecisionMode**: Multiple precision modes (FP32/FP16/BF16/FP8/FP64)
136//! - **GradientCheckpoint**: Memory-efficient gradient checkpointing
137//! - **MixedPrecisionState**: Complete training state management
138//!
139//! ## Sparse Tensor Support (Beta.1) ๐Ÿ†•
140//! - **SparseTensor**: CSR/CSC/COO sparse formats
141//! - **SparseCSR**: Compressed Sparse Row format
142//! - **SparseCSC**: Compressed Sparse Column format
143//! - **SparseCOO**: Coordinate format for construction
144//! - **Automatic sparsity detection**: Convert dense to sparse when beneficial
145//!
146//! ## Parallel Execution (Beta.1) ๐Ÿ†•
147//! - **WorkStealingScheduler**: Dynamic load balancing scheduler
148//! - **Task**: Parallel task with dependencies and priorities
149//! - **StealStrategy**: Multiple work-stealing strategies
150//! - **NumaStrategy**: NUMA-aware memory allocation
151//! - **LoadBalanceStats**: Load balancing metrics
152//!
153//! ## SIMD Optimizations (Beta.1) ๐Ÿ†•
154//! - **SimdCapabilities**: Platform detection (AVX2/AVX-512/NEON/SVE)
155//! - **AlignedBuffer**: SIMD-aligned memory allocations
156//! - **SimdInstructionSet**: Instruction set abstractions
157//! - **SimdOptimizationHints**: Compiler optimization hints
158//!
159//! ## Graph Rewriting (Beta.1) ๐Ÿ†•
160//! - **RewriteEngine**: Pattern-based graph transformations
161//! - **Pattern**: Flexible pattern matching DSL
162//! - **RewriteRule**: Custom rewrite rules
163//! - **CommonRules**: Standard optimization rules (constant folding, etc.)
164//! - **RewriteStrategy**: Application strategies (exhaustive, fixed-point, etc.)
165//!
166//! ## Profiling-Guided Optimization (Beta.1) ๐Ÿ†•
167//! - **ProfilingOptimizer**: Adaptive performance tuning
168//! - **ExecutionProfile**: Runtime performance metrics
169//! - **Hotspot**: Performance bottleneck detection
170//! - **OptimizationGoal**: Optimization objectives (latency, throughput, memory)
171//! - **Auto-tuning**: Automatic configuration selection
172//!
173//! ## Cache Optimization (Beta.1) ๐Ÿ†•
174//! - **CacheOptimizer**: Memory hierarchy aware optimization
175//! - **CacheConfig**: L1/L2/L3 cache configuration
176//! - **TilingParams**: Loop tiling for cache efficiency
177//! - **CacheMetrics**: Cache performance estimation
178//! - **DataLayout**: Cache-friendly data arrangements
179//!
180//! ## Automatic Parallelization (Experimental) ๐Ÿงช
181//! - **AutoParallelizer**: Automatic detection of parallelism opportunities
182//! - **ParallelizationAnalysis**: Analysis of parallel execution potential
183//! - **ParallelExecutionPlan**: Generated parallel execution plans
184//! - **WorkPartition**: Work distribution across workers
185//! - **Cost modeling**: Estimate execution costs and communication overhead
186//!
187//! ## Speculative Execution (Experimental) ๐Ÿงช
188//! - **SpeculativeExecutor**: Branch prediction and speculative execution
189//! - **PredictionStrategy**: Multiple prediction strategies
190//! - **RollbackPolicy**: Handling mispredictions
191//! - **SpeculationStats**: Track speculation success rates
192//! - **Adaptive learning**: Learn from prediction outcomes
193//!
194//! ## Learned Optimizations (Experimental) ๐Ÿงช
195//! - **LearnedOptimizer**: ML-based optimization decisions
196//! - **LearningStrategy**: Supervised, reinforcement, online learning
197//! - **CostPrediction**: Learned cost models
198//! - **FusionRecommendation**: ML-based fusion decisions
199//! - **Reinforcement learning**: Q-learning for scheduling
200
201pub mod async_exec;
202pub mod auto_parallel;
203pub mod autodiff;
204pub mod backend_tests;
205pub mod batch;
206pub mod cache;
207pub mod cache_optimizer;
208pub mod capabilities;
209pub mod compilation;
210pub mod context;
211pub mod debug;
212pub mod diagnostics;
213pub mod distributed;
214mod dummy_executor;
215mod dummy_tensor;
216pub mod dynamic_batching;
217pub mod eager;
218mod error;
219pub mod fusion;
220pub mod gradcheck;
221pub mod jit;
222pub mod learned_opt;
223pub mod memory;
224pub mod mixed_precision;
225pub mod multimodel;
226mod ops;
227pub mod optimization;
228pub mod parallel;
229pub mod perfregression;
230pub mod placement;
231pub mod profiling;
232pub mod profiling_optimizer;
233pub mod quantization;
234pub mod recovery;
235pub mod rewrite;
236pub mod scheduling;
237pub mod shape;
238pub mod simd;
239pub mod sparse;
240pub mod speculative;
241pub mod strategy;
242pub mod streaming;
243pub mod tensor_view;
244mod traits;
245pub mod typesafe;
246pub mod validation;
247pub mod visualization;
248pub mod workspace;
249
250#[cfg(test)]
251mod tests;
252
253#[cfg(test)]
254mod validation_tests;
255
256#[cfg(test)]
257mod memory_tests;
258
259#[cfg(feature = "async")]
260pub use async_exec::{
261    AsyncConfig, AsyncExecutionError, AsyncExecutionHandle, AsyncExecutorPool, AsyncStats,
262    AsyncStreamResults, BoxFuture, TlAsyncBatchExecutor, TlAsyncExecutor, TlAsyncStreamExecutor,
263};
264pub use auto_parallel::{
265    AutoParallelError, AutoParallelizer, CostModel as AutoParallelCostModel, DependencyType,
266    NodeId as AutoParallelNodeId, NodeInfo, ParallelExecutionPlan, ParallelStage,
267    ParallelizationAnalysis, ParallelizationStrategy, WorkPartition,
268};
269pub use autodiff::{
270    AccumulationConfig, ClippingStrategy, CustomGradientRegistry, GradientAccumulationStrategy,
271    GradientAccumulator, GradientClipper, GradientConfig, GradientScaler, GradientScaling,
272    GradientStats, TlEnhancedAutodiff,
273};
274pub use backend_tests::{
275    assert_vec_close, print_test_summary, run_all_basic_tests, run_all_performance_tests,
276    test_backend_edge_cases, test_backend_einsum, test_backend_elem_binary,
277    test_backend_elem_unary, test_backend_forward, test_backend_large_tensors,
278    test_backend_memory_efficiency, test_backend_reduce, test_backend_shapes, BackendTestAdapter,
279    TestResult, DEFAULT_TOLERANCE,
280};
281pub use batch::{BatchResult, TlBatchExecutor};
282pub use cache::{CacheKey, CacheStats, EvictionPolicy, MemoryPool, PoolStats, TensorCache};
283pub use cache_optimizer::{
284    AccessPattern, CacheConfig, CacheLevel, CacheMetrics, CacheOptimizer, CacheOptimizerError,
285    DataLayout, OptimizationStats as CacheOptimizationStats, TilingParams,
286};
287pub use capabilities::{BackendCapabilities, DType, DeviceType, Feature, TlCapabilities};
288pub use compilation::{
289    CacheStats as CompilationCacheStats, CompilationCache, CompilationConfig, CompilationKey,
290    CompilationStats, CompiledGraph, GraphCompiler, OptimizationLevel, TlCompilableExecutor,
291};
292pub use context::{ExecutionContext, ExecutionHook, ExecutionPhase, ExecutionState, LoggingHook};
293pub use debug::{
294    Breakpoint, BreakpointHit, BreakpointManager, ExecutionRecorder, ExecutionReport,
295    ExecutionTrace, ExecutionTracer, OperationHandle, TensorInspector, TensorStats,
296    TraceEntry as DebugTraceEntry, TraceSummary,
297};
298pub use diagnostics::{
299    Diagnostic, DiagnosticCollector, MemoryDiagnostic, NodeExecutionDiagnostic,
300    PerformanceDiagnostic, Severity, ShapeMismatchDiagnostic, SourceLocation,
301    TypeMismatchDiagnostic,
302};
303pub use distributed::{
304    CommunicationBackend, CommunicationOp, DataParallelCoordinator, DistributedConfig,
305    DistributedExecutor, DistributedPlacementPlan, DistributedStats, DummyCommunicationBackend,
306    ModelParallelCoordinator, ParallelismStrategy as DistributedParallelismStrategy,
307    PipelineParallelCoordinator, ReductionOp, ShardingSpec, TlDistributedExecutor,
308};
309pub use dummy_executor::DummyExecutor;
310pub use dummy_tensor::DummyTensor;
311pub use dynamic_batching::{
312    AdaptiveBatcher, BatchRequest, BatchingError, BatchingStats, DynamicBatchConfig,
313    DynamicBatcher, Priority, RequestMetadata, RequestQueue,
314};
315pub use eager::{EagerOp, EagerOps, EagerTape, TlEagerAutodiff, Variable, VariableGrad};
316pub use error::ExecutorError;
317pub use fusion::{
318    FusionCandidate, FusionConfig, FusionCostModel, FusionError, FusionOptimizer, FusionPattern,
319    FusionStats, FusionStrategy,
320};
321pub use gradcheck::{
322    compare_gradients, numerical_gradient_central, numerical_gradient_forward, quick_check,
323    GradCheckConfig, GradCheckResult, GradientChecker, GradientError,
324};
325pub use jit::{
326    AdaptiveOptimizationPlan, AdaptiveOptimizer, HotPathDetector, JitCache, JitCacheEntry,
327    JitCacheStats, JitCompiler, JitConfig, JitEntryStats, JitKey, JitStats, SpecializationContext,
328    TlJitExecutor,
329};
330pub use learned_opt::{
331    CostPrediction, FeatureVector, FusionRecommendation, LearnedOptError, LearnedOptimizer,
332    LearningStats, LearningStrategy, ModelType, NodeId as LearnedOptNodeId, OptimizationAction,
333    RewardSignal, ScheduleRecommendation, TrainingExample,
334};
335pub use memory::{MemoryEstimate, MemoryEstimator, TensorMemory};
336pub use mixed_precision::{
337    GradientCheckpoint, LossScaler, LossScalerStats, LossScalingStrategy, MixedPrecisionConfig,
338    MixedPrecisionError, MixedPrecisionState, MixedPrecisionStats, PrecisionMode,
339};
340pub use multimodel::{
341    CascadeConfig, CoordinationStats, EnsembleConfig, EnsembleStrategy, ModelMetadata,
342    MultiModelCoordinator, MultiModelError, ResourceRequirements, RoutingStrategy,
343    TlEnsembleExecutor, TlModelRouter,
344};
345pub use ops::{ElemOp, ReduceOp};
346pub use optimization::{
347    FusionOpportunity, FusionPlanner, FusionType, GraphOptimizer, OptimizationResult,
348};
349pub use parallel::{
350    LoadBalanceStats, NumaNode, NumaStrategy, ParallelConfig, ParallelError, SchedulerStats,
351    StealStrategy, Task, TaskId, TaskPriority, WorkStealingScheduler,
352};
353pub use perfregression::{
354    BenchmarkBaseline, BenchmarkComparison, BenchmarkConfig, BenchmarkStats, PerfRegression,
355    RegressionReport,
356};
357pub use placement::{Device, PlacementOptimizer, PlacementPlan, PlacementStrategy};
358pub use profiling::{
359    Bottleneck, BottleneckAnalyzer, BottleneckReport, PerformanceBaseline, PerformanceComparison,
360    ProfileData, ProfileStatistics, Profiler, ProfilerHook, TimelineProfiler, TlProfiledExecutor,
361    TraceEntry,
362};
363pub use profiling_optimizer::{
364    ExecutionProfile, Hotspot, OptimizationGoal, OptimizationReport, OptimizationStrategy,
365    ProfilingOptimizer, ProfilingOptimizerError, TuningConfig,
366};
367pub use quantization::{
368    CalibrationStats, CalibrationStrategy, FakeQuantize, QuantizationConfig, QuantizationError,
369    QuantizationGranularity, QuantizationMode, QuantizationParams, QuantizationSummary,
370    QuantizationSymmetry, QuantizationType, Quantizer,
371};
372pub use recovery::{
373    Checkpoint, CheckpointManager, DegradationPolicy, FailureInfo, FallbackStrategy,
374    RecoveryConfig, RecoveryMetadata, RecoveryResult, RecoveryStats, RecoveryStrategy, RetryPolicy,
375    TlRecoverableExecutor,
376};
377pub use rewrite::{
378    CommonRules, Match, NodeId as RewriteNodeId, Pattern, ReplacementFn, RewriteEngine,
379    RewriteError, RewriteRule, RewriteStats, RewriteStrategy,
380};
381pub use scheduling::{ExecutionSchedule, NodeCost, Scheduler, SchedulingStrategy};
382pub use shape::{DimSize, ShapeInferenceContext, TensorShape};
383pub use simd::{
384    AlignedBuffer, CpuArchitecture, SimdCapabilities, SimdError, SimdInstructionSet,
385    SimdOptimizationHints,
386};
387pub use sparse::{
388    detect_sparsity, to_sparse_if_beneficial, SparseCOO, SparseCSC, SparseCSR, SparseError,
389    SparseFormat, SparseTensor, SparseTensorBuilder,
390};
391pub use speculative::{
392    BranchOutcome, NodeId as SpeculativeNodeId, PredictionStrategy, RollbackPolicy,
393    SpeculationStats, SpeculativeError, SpeculativeExecutor, SpeculativeTask,
394};
395pub use strategy::{
396    ExecutionMode, ExecutionStrategy, GradientStrategy, MemoryStrategy, ParallelismStrategy,
397    StrategyOptimizer,
398};
399pub use streaming::{
400    ChunkIterator, ChunkMetadata, StreamProcessor, StreamResult, StreamingConfig, StreamingMode,
401    TlStreamingExecutor,
402};
403pub use tensor_view::{
404    InPlaceMode, InPlaceOps, SliceSpec, TensorView, TensorViewable, ViewBuilder,
405};
406pub use traits::{TlAutodiff, TlExecutor};
407pub use typesafe::{
408    BroadcastShape, Dim, DimMul, DimOp, DimSize as TypesafeDimSize, Dyn, EinsumSpec, FixedShape,
409    Matrix, MatrixOps, Nat, Scalar, ShapeConstraint, ShapedTensor, Static, Tensor3D, Tensor4D,
410    TensorBuilder, TypedBatch, TypedInputs, TypedOutputs, TypedTensor, TypedTensorOps, Vector, D1,
411    D2, D3, D4, D5, D6, S, Z,
412};
413pub use validation::{GraphValidator, ValidationResult};
414pub use visualization::{
415    ExportFormat, GraphConfig, GraphVisualizer, TensorStatsVisualizer, TimelineConfig,
416    TimelineVisualizer, VisualizationFormat,
417};
418pub use workspace::{
419    AllocationStrategy, DefragmentationResult, SharedWorkspacePool, Workspace, WorkspaceConfig,
420    WorkspaceError, WorkspacePool, WorkspaceStats,
421};