Expand description
§TenfloweRS Automatic Differentiation
tenflowers-autograd provides a comprehensive automatic differentiation engine for the TenfloweRS
machine learning framework. This crate implements both forward-mode and reverse-mode automatic
differentiation with support for higher-order derivatives, custom gradients, and advanced
optimization techniques.
§Features
- Complete Gradient Operations: All fundamental tensor operations with mathematically correct gradients
- Higher-Order Derivatives: Efficient computation of Hessians, third-order derivatives, and beyond
- Performance Optimization: Kernel fusion, memory optimization, and distributed gradient computation
- Advanced Differentiation: Mixed-mode AD, implicit differentiation, and custom gradient functions
- Neural Network Integration: Seamless integration with tenflowers-neural for deep learning
- Distributed Training: Parameter servers, gradient compression, and cross-datacenter replication
§Quick Start
use tenflowers_autograd::{GradientTape, TrackedTensor};
use tenflowers_core::{Tensor, Device};
let device = Device::Cpu;
let mut tape = GradientTape::new();
// Create tracked tensors
let x = tape.watch(Tensor::<f32>::ones(&[2, 2]));
let y = tape.watch(Tensor::<f32>::ones(&[2, 2]));
// Compute gradients using GradientTape::gradient
let z = tape.watch(Tensor::<f32>::ones(&[2, 2])); // Placeholder for x+y result
let gradients = tape.gradient(&[z], &[x, y])?;
println!("Gradient of x: {:?}", gradients[0]);§Advanced Usage
§Custom Gradients
use tenflowers_autograd::{CustomGradientFunction, GradientTape};
use tenflowers_core::{Tensor, Result};
struct MyCustomOp;
impl CustomGradientFunction<f32> for MyCustomOp {
fn forward(&self, inputs: &[&Tensor<f32>]) -> Result<Tensor<f32>> {
// Custom forward implementation: y = x^2 + sin(x)
let x = inputs[0];
let x_squared = tenflowers_core::ops::mul(x, x)?;
let sin_x = tenflowers_core::ops::sin(x)?;
tenflowers_core::ops::add(&x_squared, &sin_x)
}
fn backward(&self, grad_output: &Tensor<f32>, inputs: &[&Tensor<f32>], output: &Tensor<f32>) -> Result<Vec<Tensor<f32>>> {
// Custom backward implementation: dy/dx = 2x + cos(x)
let x = inputs[0];
let two = tenflowers_core::Tensor::from_array(scirs2_core::ndarray::arr0(2.0f32).into_dyn());
let two_x = tenflowers_core::ops::mul(&two, x)?;
let cos_x = tenflowers_core::ops::cos(x)?;
let grad_x = tenflowers_core::ops::add(&two_x, &cos_x)?;
let final_grad = tenflowers_core::ops::mul(grad_output, &grad_x)?;
Ok(vec![final_grad])
}
fn name(&self) -> &str {
"MyCustomOp"
}
}§Higher-Order Derivatives
use tenflowers_autograd::{GradientTape, TrackedTensor};
use tenflowers_core::Tensor;
let mut tape = GradientTape::new();
let x = TrackedTensor::new(Tensor::<f32>::ones(&[1]));
let target = TrackedTensor::new(Tensor::<f32>::ones(&[1]));
// Compute third-order derivatives
let third_order = tape.third_derivative(&target, &x)?;
// Compute nth-order derivatives
let nth_order = tape.nth_derivative(&target, &x, 3)?;§Performance Features
- Kernel Fusion: Automatically fuses operations to reduce memory bandwidth
- Gradient Compression: Quantization and sparsification for distributed training
- Memory Optimization: Checkpointing and in-place operations for large models
- JIT Compilation: Runtime kernel optimization for specific tensor shapes
§Integration with TenfloweRS Ecosystem
This crate integrates seamlessly with:
tenflowers-core: Core tensor operations and device managementtenflowers-neural: Neural network layers and training loopstenflowers-dataset: Data loading and preprocessingscirs2-autograd: Static graph optimization and analysis
Re-exports§
pub use boolean_indexing::boolean_mask_backward;pub use boolean_indexing::integer_array_indexing_backward;pub use boolean_indexing::where_backward;pub use checkpointing::checkpoint_sequence;pub use checkpointing::ActivationCheckpointPolicy;pub use checkpointing::ActivationCheckpointing;pub use checkpointing::ActivationRecomputeManager;pub use checkpointing::CheckpointManager;pub use checkpointing::CheckpointStrategy;pub use checkpointing::CheckpointedFunction;pub use checkpointing::CheckpointedGradientTape;pub use checkpointing::CheckpointingStats;pub use checkpointing::LayerMetadata;pub use checkpointing::RecomputationContext;pub use context::AutogradContext;pub use context::ShapeInferenceRule;pub use context::StaticShapeInference;pub use coverage_matrix::CategoryCoverage;pub use coverage_matrix::CoverageMatrix;pub use coverage_matrix::CoverageReport;pub use coverage_matrix::OperationCategory;pub use coverage_matrix::OperationMetadata;pub use custom_gradients::CustomGradientFunction;pub use custom_gradients::CustomGradientOp;pub use custom_gradients::GradientClipFunction;pub use custom_gradients::GradientScaleFunction;pub use custom_gradients::StopGradientFunction;pub use debug::GradientDebugInfo;pub use debug::GradientDebugger;pub use deterministic::clear_operation_seeds;pub use deterministic::get_global_seed;pub use deterministic::get_operation_seed;pub use deterministic::get_seeded_operation_count;pub use deterministic::hash_tensor_data;pub use deterministic::is_deterministic;pub use deterministic::reset_deterministic_state;pub use deterministic::set_deterministic;pub use deterministic::set_global_seed;pub use deterministic::set_operation_seed;pub use deterministic::DeterministicConfig;pub use deterministic::DeterministicContext;pub use deterministic::DeterministicOperation;pub use deterministic::ReproducibilityChecker;pub use deterministic::ReproducibilityStats;pub use deterministic::SeedManager;pub use device_placement::DevicePlacementConfig;pub use device_placement::DevicePlacementOptimizer;pub use device_placement::GraphOperation;pub use device_placement::PlacementDecision;pub use device_placement::PlacementResult;pub use device_placement::PlacementStrategy;pub use ellipsis_newaxis::ellipsis_newaxis_backward;pub use ellipsis_newaxis::AdvancedIndexer;pub use ellipsis_newaxis::IndexSpec;pub use error_taxonomy::utils as error_utils;pub use error_taxonomy::AutogradErrorBuilder;pub use error_taxonomy::ErrorPatternValidator;pub use error_taxonomy::GradientContext;pub use error_taxonomy::ValidationResult;pub use forward_ad::forward_ops;pub use forward_ad::DualTensor;pub use forward_ad::ForwardADContext;pub use forward_ad::ForwardMode;pub use forward_reverse::ComplexityEstimate;pub use forward_reverse::DifferentiationMode;pub use forward_reverse::ForwardReverseConfig;pub use forward_reverse::ForwardReverseDifferentiator;pub use global_pooling::adaptive_avg_pool2d_backward;pub use global_pooling::adaptive_max_pool2d_backward;pub use global_pooling::fractional_adaptive_avg_pool2d_backward;pub use global_pooling::global_avg_pool2d_backward;pub use global_pooling::global_max_pool2d_backward;pub use gpu_gradient_expansion::GpuCategoryCoverage;pub use gpu_gradient_expansion::GpuCoverageAnalysis;pub use gpu_gradient_expansion::GpuGradInfo;pub use gpu_gradient_expansion::GpuGradStatus;pub use gpu_gradient_expansion::GpuGradientPlanner;pub use gpu_gradient_expansion::ImplementationPlan;pub use gpu_gradient_expansion::ImplementationTask;pub use gpu_gradient_expansion::Priority;pub use grad_ops::batch_fused_activations_forward_backward;pub use grad_ops::fused_gelu_forward_backward;pub use grad_ops::fused_log_softmax_forward_backward;pub use grad_ops::fused_tanh_forward_backward;pub use gradient_accumulation::accumulate_gradients_over_batch;pub use gradient_accumulation::GradientAccumulator;pub use gradient_buffer_manager_simple::global_gradient_buffer_manager;pub use gradient_buffer_manager_simple::AllocationMetrics;pub use gradient_buffer_manager_simple::EfficiencyMetrics;pub use gradient_buffer_manager_simple::GradientBufferAllocation;pub use gradient_buffer_manager_simple::GradientBufferConfig;pub use gradient_buffer_manager_simple::GradientBufferManager;pub use gradient_buffer_manager_simple::GradientMemoryStatistics;pub use gradient_buffer_manager_simple::MemoryPressureStatistics;pub use gradient_compression::CompressedGradient;pub use gradient_compression::CompressionConfig;pub use gradient_compression::CompressionMethod;pub use gradient_compression::CompressionStats;pub use gradient_compression::GradientCompressor;pub use gradient_ops::accumulate_gradients;pub use gradient_ops::add_gradient_noise;pub use gradient_ops::average_gradients;pub use gradient_ops::clip_by_global_norm;pub use gradient_ops::clip_by_value;pub use gradient_ops::compute_gradient_statistics;pub use gradient_ops::has_invalid_gradients;pub use gradient_ops::scale_gradients;pub use gradient_ops::zero_gradients;pub use gradient_ops::GradientPipeline;pub use gradient_ops::GradientStatistics;pub use gradient_ops::NamedGradientAccumulator;pub use gradient_visualization::ColorScheme;pub use gradient_visualization::EdgeType;pub use gradient_visualization::GradientFlowAnalysis;pub use gradient_visualization::GradientFlowEdge;pub use gradient_visualization::GradientFlowIssue;pub use gradient_visualization::GradientFlowNode;pub use gradient_visualization::GradientFlowVisualizer;pub use gradient_visualization::GradientStats;pub use gradient_visualization::IssueType;pub use gradient_visualization::LayoutAlgorithm;pub use gradient_visualization::NodeType;pub use gradient_visualization::OutputFormat;pub use gradient_visualization::Severity;pub use gradient_visualization::ValueStats;pub use gradient_visualization::VisualizationSettings;pub use graph_optimization::CommunicationPlan;pub use graph_optimization::EnhancedGraphOptimizer;pub use graph_optimization::GradientFusion;pub use graph_optimization::GraphOptimizationConfig;pub use graph_optimization::GraphOptimizationResult;pub use graph_optimization::MemoryOptimization;pub use hybrid_scheduler::ExecutionStats;pub use hybrid_scheduler::ExecutionSummary;pub use hybrid_scheduler::GraphAnalysis;pub use hybrid_scheduler::HybridScheduler;pub use hybrid_scheduler::SchedulerConfig;pub use hybrid_scheduler::StrategyCost;pub use implicit_differentiation::FixedPointFunction;pub use implicit_differentiation::GradientInfo;pub use implicit_differentiation::ImplicitDiffConfig;pub use implicit_differentiation::ImplicitDifferentiator;pub use implicit_differentiation::ImplicitFunction;pub use implicit_differentiation::OptimizationLayer;pub use inplace_ops::InPlaceOptimizer;pub use inplace_ops::InPlaceSequenceOptimizer;pub use jit_compiler::CompiledKernel;pub use jit_compiler::DeviceFeatures;pub use jit_compiler::GradientKernelTemplate;pub use jit_compiler::JitCompiler;pub use jit_compiler::KernelPerformance;pub use jit_compiler::KernelSignature;pub use jit_compiler::OptimizationLevel;pub use jit_integration::utils as jit_utils;pub use jit_integration::JitConfig;pub use jit_integration::JitGradientContext;pub use jit_integration::JitGradientTapeExt;pub use kernel_fusion::FusableOp;pub use kernel_fusion::FusedKernel;pub use kernel_fusion::FusionStats;pub use kernel_fusion::KernelFusionOptimizer;pub use kernel_fusion::OpSequence;pub use memory_diff_reporter::MemoryDiff;pub use memory_diff_reporter::MemoryDiffReporter;pub use memory_diff_reporter::MemorySnapshot;pub use memory_profiler::get_global_profiler;pub use memory_profiler::GradientMemoryProfiler;pub use memory_profiler::MemoryReport;pub use memory_profiler::MemoryStats;pub use neural_integration::AutogradLayer;pub use neural_integration::AutogradOptimizer;pub use neural_integration::AutogradTrainer;pub use neural_integration::OptimizerType;pub use neural_integration::TrainingMetrics;pub use no_grad::enable_grad;pub use no_grad::is_grad_enabled;pub use no_grad::no_grad;pub use no_grad::set_grad_enabled;pub use no_grad::EnableGradGuard;pub use no_grad::NoGradGuard;pub use numerical_checker::CheckerConfig;pub use numerical_checker::ErrorAnalysis;pub use numerical_checker::FiniteDifferenceMethod;pub use numerical_checker::GradientCheckResult;pub use numerical_checker::NumericalChecker;pub use parameter_server::FaultToleranceMode;pub use parameter_server::LoadBalancingStrategy;pub use parameter_server::ParameterServer;pub use parameter_server::ParameterServerClient;pub use parameter_server::ParameterServerConfig;pub use parameter_server::ParameterServerStats;pub use performance_benchmark::BenchmarkConfig;pub use performance_benchmark::BenchmarkReport;pub use performance_benchmark::BenchmarkResult;pub use performance_benchmark::BenchmarkStatistics;pub use performance_benchmark::BenchmarkSummary;pub use performance_benchmark::ComparisonResult;pub use performance_benchmark::PerformanceBenchmark;pub use performance_benchmark::RegressionReport;pub use performance_benchmark::RegressionSeverity;pub use performance_benchmark::ThroughputMetrics;pub use simd_grad_ops_simple::global_simd_grad_ops;pub use simd_grad_ops_simple::SimdGradConfig;pub use simd_grad_ops_simple::SimdGradOps;pub use simd_grad_ops_simple::SimdPerformanceMetrics;pub use special_functions::bessel_j0_backward;pub use special_functions::bessel_j1_backward;pub use special_functions::beta_backward;pub use special_functions::digamma_backward;pub use special_functions::erf_backward;pub use special_functions::erfc_backward;pub use special_functions::gamma_backward;pub use special_functions::lgamma_backward;pub use subgraph_extraction::ExtractionStrategy;pub use subgraph_extraction::Subgraph;pub use subgraph_extraction::SubgraphConfig;pub use subgraph_extraction::SubgraphExtractionResult;pub use subgraph_extraction::SubgraphExtractor;pub use subgraph_extraction::SubgraphOperation;pub use tape::GradientTape;pub use tape::Operation;pub use tape::TapeNode;pub use tape::TrackedTensor;pub use tape_optimization::TapeOptimizationConfig;pub use tape_optimization::TapeOptimizationStats;pub use tape_optimization::TapeOptimizer;pub use tensor_ext::TensorAutograd;pub use tensor_networks::ContractionEdge;pub use tensor_networks::ContractionPath;pub use tensor_networks::ContractionStep;pub use tensor_networks::ContractionStrategy;pub use tensor_networks::TensorNetwork;pub use tensor_networks::TensorNetworkGradient;pub use tensor_networks::TensorNetworkNode;pub use tensor_networks::TensorNetworkOptimizer;pub use ultra_gradient_engine_simple::global_ultra_gradient_engine;pub use ultra_gradient_engine_simple::GradientMemoryStats;pub use ultra_gradient_engine_simple::GradientPerformanceMetrics;pub use ultra_gradient_engine_simple::OptimizationInsights;pub use ultra_gradient_engine_simple::UltraGradientConfig;pub use ultra_gradient_engine_simple::UltraGradientEngine;pub use ultra_gradient_engine_simple::UltraGradientResult;pub use ultra_gradient_engine_simple::UltraGradientTapeExt;pub use advanced_grad_ops::gradient_clipping;pub use advanced_grad_ops::higher_order as advanced_higher_order;pub use advanced_grad_ops::jacobian;pub use advanced_grad_ops::optimization;pub use advanced_grad_ops::AdaptiveGradientAccumulator;pub use amp_policy::AMPConfig;pub use amp_policy::AMPPolicy;pub use amp_policy::AMPStabilityMetrics;pub use amp_policy::ScaleAdjustment;pub use amp_policy::ScaleAdjustmentReason;pub use efficient_memory::AggregationStats;pub use efficient_memory::CheckpointStats;pub use efficient_memory::GradientCheckpointer;pub use efficient_memory::GradientMemoryManager;pub use efficient_memory::GradientMemoryPool;pub use efficient_memory::LazyGradient;pub use efficient_memory::MemoryManagerStats;pub use efficient_memory::MemoryPoolStats;pub use efficient_memory::StreamingGradientAggregator;pub use gradient_analyzer::AnalysisConfig;pub use gradient_analyzer::GradientAnalysisReport;pub use gradient_analyzer::GradientAnalyzer;pub use gradient_analyzer::GradientFlowAnalysis as AdvancedGradientFlowAnalysis;pub use gradient_analyzer::GradientIssue;pub use gradient_analyzer::GradientStatistics as AnalyzerGradientStatistics;pub use gradient_analyzer::PerformanceMetrics;pub use second_order_utils::compute_hessian;pub use second_order_utils::compute_hessian_diagonal;pub use second_order_utils::compute_jacobian;pub use second_order_utils::compute_laplacian;pub use second_order_utils::directional_second_derivative;pub use second_order_utils::hessian_vector_product;
Modules§
- advanced_
grad_ ops - Advanced Gradient Operations
- amp_
policy - Automatic Mixed Precision (AMP) Policy for Gradient Computation
- boolean_
indexing - checkpointing
- context
- coverage_
matrix - Gradient Coverage Matrix Generator
- custom_
gradients - debug
- deterministic
- Deterministic Execution Mode
- device_
placement - Device Placement Optimization
- efficient_
memory - Memory-Efficient Gradient Computation
- ellipsis_
newaxis - error_
taxonomy - Error Taxonomy Alignment for Autograd
- forward_
ad - forward_
reverse - Forward-Reverse Mode Integration
- global_
pooling - Global pooling gradient operations
- gpu_
gradient_ expansion - GPU Gradient Expansion Strategy
- grad_
ops - Gradient operations module
- gradient_
accumulation - gradient_
analyzer - Gradient Analysis and Debugging Tools
- gradient_
buffer_ manager_ simple - Simplified Gradient Buffer Manager for Ultra-High-Performance Memory Management
- gradient_
compression - gradient_
compression_ advanced - Advanced Gradient Compression for Distributed Training
- gradient_
ops - Gradient Operation Utilities
- gradient_
utils - Gradient Computation Utilities
- gradient_
visualization - Gradient Flow Visualization
- graph_
optimization - Enhanced Graph Optimization Framework
- higher_
order - Higher-order derivative computation
- hybrid_
scheduler - Hybrid Differentiation Scheduler
- implicit_
differentiation - Implicit Differentiation
- inplace_
ops - jit_
compiler - JIT Compilation System for Gradient Kernels
- jit_
integration - JIT Compiler Integration with Autograd System
- kernel_
fusion - memory_
diff_ reporter - Memory Diff Reporter
- memory_
profiler - neural_
integration - Neural network integration layer for autograd
- no_grad
- No-gradient context for disabling gradient computation
- numerical_
checker - Numerical Gradient Checker
- ops
- Gradient operation modules
- parameter_
server - Parameter server for distributed training.
- performance_
benchmark - Performance Benchmarking Framework for Autograd
- second_
order - Second-Order Gradient Methods
- second_
order_ utils - Second-Order Derivative Utilities
- simd_
grad_ ops_ simple - Simplified SIMD-Accelerated Gradient Operations for Ultra-High-Performance
- special_
functions - Gradient operations for special mathematical functions
- subgraph_
extraction - Subgraph Extraction and Parallelization
- tape
- Automatic differentiation tape for gradient computation
- tape_
optimization - tensor_
ext - tensor_
networks - Tensor Network Gradients
- ultra_
gradient - Ultra-High-Performance Gradient Computation Engine
- ultra_
gradient_ engine_ simple - Simplified Ultra-High-Performance Gradient Computation Engine
Macros§
- enable_
grad - Macro for enable-grad context (convenience)
- no_grad
- Macro for no-grad context (convenience)
- profile_
gradient_ op - Convenience macro for profiling gradient operations