Skip to main content

Crate tenflowers_autograd

Crate tenflowers_autograd 

Source
Expand description

§TenfloweRS Automatic Differentiation

tenflowers-autograd provides a comprehensive automatic differentiation engine for the TenfloweRS machine learning framework. This crate implements both forward-mode and reverse-mode automatic differentiation with support for higher-order derivatives, custom gradients, and advanced optimization techniques.

§Features

  • Complete Gradient Operations: All fundamental tensor operations with mathematically correct gradients
  • Higher-Order Derivatives: Efficient computation of Hessians, third-order derivatives, and beyond
  • Performance Optimization: Kernel fusion, memory optimization, and distributed gradient computation
  • Advanced Differentiation: Mixed-mode AD, implicit differentiation, and custom gradient functions
  • Neural Network Integration: Seamless integration with tenflowers-neural for deep learning
  • Distributed Training: Parameter servers, gradient compression, and cross-datacenter replication

§Quick Start

use tenflowers_autograd::{GradientTape, TrackedTensor};
use tenflowers_core::{Tensor, Device};

let device = Device::Cpu;
let mut tape = GradientTape::new();

// Create tracked tensors
let x = tape.watch(Tensor::<f32>::ones(&[2, 2]));
let y = tape.watch(Tensor::<f32>::ones(&[2, 2]));

// Compute gradients using GradientTape::gradient
let z = tape.watch(Tensor::<f32>::ones(&[2, 2])); // Placeholder for x+y result
let gradients = tape.gradient(&[z], &[x, y])?;
println!("Gradient of x: {:?}", gradients[0]);

§Advanced Usage

§Custom Gradients

use tenflowers_autograd::{CustomGradientFunction, GradientTape};
use tenflowers_core::{Tensor, Result};

struct MyCustomOp;

impl CustomGradientFunction<f32> for MyCustomOp {
    fn forward(&self, inputs: &[&Tensor<f32>]) -> Result<Tensor<f32>> {
        // Custom forward implementation: y = x^2 + sin(x)
        let x = inputs[0];
        let x_squared = tenflowers_core::ops::mul(x, x)?;
        let sin_x = tenflowers_core::ops::sin(x)?;
        tenflowers_core::ops::add(&x_squared, &sin_x)
    }

    fn backward(&self, grad_output: &Tensor<f32>, inputs: &[&Tensor<f32>], output: &Tensor<f32>) -> Result<Vec<Tensor<f32>>> {
        // Custom backward implementation: dy/dx = 2x + cos(x)
        let x = inputs[0];
        let two = tenflowers_core::Tensor::from_array(scirs2_core::ndarray::arr0(2.0f32).into_dyn());
        let two_x = tenflowers_core::ops::mul(&two, x)?;
        let cos_x = tenflowers_core::ops::cos(x)?;
        let grad_x = tenflowers_core::ops::add(&two_x, &cos_x)?;
        let final_grad = tenflowers_core::ops::mul(grad_output, &grad_x)?;
        Ok(vec![final_grad])
    }

    fn name(&self) -> &str {
        "MyCustomOp"
    }
}

§Higher-Order Derivatives

use tenflowers_autograd::{GradientTape, TrackedTensor};
use tenflowers_core::Tensor;

let mut tape = GradientTape::new();
let x = TrackedTensor::new(Tensor::<f32>::ones(&[1]));
let target = TrackedTensor::new(Tensor::<f32>::ones(&[1]));

// Compute third-order derivatives
let third_order = tape.third_derivative(&target, &x)?;

// Compute nth-order derivatives
let nth_order = tape.nth_derivative(&target, &x, 3)?;

§Performance Features

  • Kernel Fusion: Automatically fuses operations to reduce memory bandwidth
  • Gradient Compression: Quantization and sparsification for distributed training
  • Memory Optimization: Checkpointing and in-place operations for large models
  • JIT Compilation: Runtime kernel optimization for specific tensor shapes

§Integration with TenfloweRS Ecosystem

This crate integrates seamlessly with:

  • tenflowers-core: Core tensor operations and device management
  • tenflowers-neural: Neural network layers and training loops
  • tenflowers-dataset: Data loading and preprocessing
  • scirs2-autograd: Static graph optimization and analysis

Re-exports§

pub use boolean_indexing::boolean_mask_backward;
pub use boolean_indexing::integer_array_indexing_backward;
pub use boolean_indexing::where_backward;
pub use checkpointing::checkpoint_sequence;
pub use checkpointing::ActivationCheckpointPolicy;
pub use checkpointing::ActivationCheckpointing;
pub use checkpointing::ActivationRecomputeManager;
pub use checkpointing::CheckpointManager;
pub use checkpointing::CheckpointStrategy;
pub use checkpointing::CheckpointedFunction;
pub use checkpointing::CheckpointedGradientTape;
pub use checkpointing::CheckpointingStats;
pub use checkpointing::LayerMetadata;
pub use checkpointing::RecomputationContext;
pub use context::AutogradContext;
pub use context::ShapeInferenceRule;
pub use context::StaticShapeInference;
pub use coverage_matrix::CategoryCoverage;
pub use coverage_matrix::CoverageMatrix;
pub use coverage_matrix::CoverageReport;
pub use coverage_matrix::OperationCategory;
pub use coverage_matrix::OperationMetadata;
pub use custom_gradients::CustomGradientFunction;
pub use custom_gradients::CustomGradientOp;
pub use custom_gradients::GradientClipFunction;
pub use custom_gradients::GradientScaleFunction;
pub use custom_gradients::StopGradientFunction;
pub use debug::GradientDebugInfo;
pub use debug::GradientDebugger;
pub use deterministic::clear_operation_seeds;
pub use deterministic::get_global_seed;
pub use deterministic::get_operation_seed;
pub use deterministic::get_seeded_operation_count;
pub use deterministic::hash_tensor_data;
pub use deterministic::is_deterministic;
pub use deterministic::reset_deterministic_state;
pub use deterministic::set_deterministic;
pub use deterministic::set_global_seed;
pub use deterministic::set_operation_seed;
pub use deterministic::DeterministicConfig;
pub use deterministic::DeterministicContext;
pub use deterministic::DeterministicOperation;
pub use deterministic::ReproducibilityChecker;
pub use deterministic::ReproducibilityStats;
pub use deterministic::SeedManager;
pub use device_placement::DevicePlacementConfig;
pub use device_placement::DevicePlacementOptimizer;
pub use device_placement::GraphOperation;
pub use device_placement::PlacementDecision;
pub use device_placement::PlacementResult;
pub use device_placement::PlacementStrategy;
pub use ellipsis_newaxis::ellipsis_newaxis_backward;
pub use ellipsis_newaxis::AdvancedIndexer;
pub use ellipsis_newaxis::IndexSpec;
pub use error_taxonomy::utils as error_utils;
pub use error_taxonomy::AutogradErrorBuilder;
pub use error_taxonomy::ErrorPatternValidator;
pub use error_taxonomy::GradientContext;
pub use error_taxonomy::ValidationResult;
pub use forward_ad::forward_ops;
pub use forward_ad::DualTensor;
pub use forward_ad::ForwardADContext;
pub use forward_ad::ForwardMode;
pub use forward_reverse::ComplexityEstimate;
pub use forward_reverse::DifferentiationMode;
pub use forward_reverse::ForwardReverseConfig;
pub use forward_reverse::ForwardReverseDifferentiator;
pub use global_pooling::adaptive_avg_pool2d_backward;
pub use global_pooling::adaptive_max_pool2d_backward;
pub use global_pooling::fractional_adaptive_avg_pool2d_backward;
pub use global_pooling::global_avg_pool2d_backward;
pub use global_pooling::global_max_pool2d_backward;
pub use gpu_gradient_expansion::GpuCategoryCoverage;
pub use gpu_gradient_expansion::GpuCoverageAnalysis;
pub use gpu_gradient_expansion::GpuGradInfo;
pub use gpu_gradient_expansion::GpuGradStatus;
pub use gpu_gradient_expansion::GpuGradientPlanner;
pub use gpu_gradient_expansion::ImplementationPlan;
pub use gpu_gradient_expansion::ImplementationTask;
pub use gpu_gradient_expansion::Priority;
pub use grad_ops::batch_fused_activations_forward_backward;
pub use grad_ops::fused_gelu_forward_backward;
pub use grad_ops::fused_log_softmax_forward_backward;
pub use grad_ops::fused_tanh_forward_backward;
pub use gradient_accumulation::accumulate_gradients_over_batch;
pub use gradient_accumulation::GradientAccumulator;
pub use gradient_buffer_manager_simple::global_gradient_buffer_manager;
pub use gradient_buffer_manager_simple::AllocationMetrics;
pub use gradient_buffer_manager_simple::EfficiencyMetrics;
pub use gradient_buffer_manager_simple::GradientBufferAllocation;
pub use gradient_buffer_manager_simple::GradientBufferConfig;
pub use gradient_buffer_manager_simple::GradientBufferManager;
pub use gradient_buffer_manager_simple::GradientMemoryStatistics;
pub use gradient_buffer_manager_simple::MemoryPressureStatistics;
pub use gradient_compression::CompressedGradient;
pub use gradient_compression::CompressionConfig;
pub use gradient_compression::CompressionMethod;
pub use gradient_compression::CompressionStats;
pub use gradient_compression::GradientCompressor;
pub use gradient_ops::accumulate_gradients;
pub use gradient_ops::add_gradient_noise;
pub use gradient_ops::average_gradients;
pub use gradient_ops::clip_by_global_norm;
pub use gradient_ops::clip_by_value;
pub use gradient_ops::compute_gradient_statistics;
pub use gradient_ops::has_invalid_gradients;
pub use gradient_ops::scale_gradients;
pub use gradient_ops::zero_gradients;
pub use gradient_ops::GradientPipeline;
pub use gradient_ops::GradientStatistics;
pub use gradient_ops::NamedGradientAccumulator;
pub use gradient_visualization::ColorScheme;
pub use gradient_visualization::EdgeType;
pub use gradient_visualization::GradientFlowAnalysis;
pub use gradient_visualization::GradientFlowEdge;
pub use gradient_visualization::GradientFlowIssue;
pub use gradient_visualization::GradientFlowNode;
pub use gradient_visualization::GradientFlowVisualizer;
pub use gradient_visualization::GradientStats;
pub use gradient_visualization::IssueType;
pub use gradient_visualization::LayoutAlgorithm;
pub use gradient_visualization::NodeType;
pub use gradient_visualization::OutputFormat;
pub use gradient_visualization::Severity;
pub use gradient_visualization::ValueStats;
pub use gradient_visualization::VisualizationSettings;
pub use graph_optimization::CommunicationPlan;
pub use graph_optimization::EnhancedGraphOptimizer;
pub use graph_optimization::GradientFusion;
pub use graph_optimization::GraphOptimizationConfig;
pub use graph_optimization::GraphOptimizationResult;
pub use graph_optimization::MemoryOptimization;
pub use hybrid_scheduler::ExecutionStats;
pub use hybrid_scheduler::ExecutionSummary;
pub use hybrid_scheduler::GraphAnalysis;
pub use hybrid_scheduler::HybridScheduler;
pub use hybrid_scheduler::SchedulerConfig;
pub use hybrid_scheduler::StrategyCost;
pub use implicit_differentiation::FixedPointFunction;
pub use implicit_differentiation::GradientInfo;
pub use implicit_differentiation::ImplicitDiffConfig;
pub use implicit_differentiation::ImplicitDifferentiator;
pub use implicit_differentiation::ImplicitFunction;
pub use implicit_differentiation::OptimizationLayer;
pub use inplace_ops::InPlaceOptimizer;
pub use inplace_ops::InPlaceSequenceOptimizer;
pub use jit_compiler::CompiledKernel;
pub use jit_compiler::DeviceFeatures;
pub use jit_compiler::GradientKernelTemplate;
pub use jit_compiler::JitCompiler;
pub use jit_compiler::KernelPerformance;
pub use jit_compiler::KernelSignature;
pub use jit_compiler::OptimizationLevel;
pub use jit_integration::utils as jit_utils;
pub use jit_integration::JitConfig;
pub use jit_integration::JitGradientContext;
pub use jit_integration::JitGradientTapeExt;
pub use kernel_fusion::FusableOp;
pub use kernel_fusion::FusedKernel;
pub use kernel_fusion::FusionStats;
pub use kernel_fusion::KernelFusionOptimizer;
pub use kernel_fusion::OpSequence;
pub use memory_diff_reporter::MemoryDiff;
pub use memory_diff_reporter::MemoryDiffReporter;
pub use memory_diff_reporter::MemorySnapshot;
pub use memory_profiler::get_global_profiler;
pub use memory_profiler::GradientMemoryProfiler;
pub use memory_profiler::MemoryReport;
pub use memory_profiler::MemoryStats;
pub use neural_integration::AutogradLayer;
pub use neural_integration::AutogradOptimizer;
pub use neural_integration::AutogradTrainer;
pub use neural_integration::OptimizerType;
pub use neural_integration::TrainingMetrics;
pub use no_grad::enable_grad;
pub use no_grad::is_grad_enabled;
pub use no_grad::no_grad;
pub use no_grad::set_grad_enabled;
pub use no_grad::EnableGradGuard;
pub use no_grad::NoGradGuard;
pub use numerical_checker::CheckerConfig;
pub use numerical_checker::ErrorAnalysis;
pub use numerical_checker::FiniteDifferenceMethod;
pub use numerical_checker::GradientCheckResult;
pub use numerical_checker::NumericalChecker;
pub use parameter_server::FaultToleranceMode;
pub use parameter_server::LoadBalancingStrategy;
pub use parameter_server::ParameterServer;
pub use parameter_server::ParameterServerClient;
pub use parameter_server::ParameterServerConfig;
pub use parameter_server::ParameterServerStats;
pub use performance_benchmark::BenchmarkConfig;
pub use performance_benchmark::BenchmarkReport;
pub use performance_benchmark::BenchmarkResult;
pub use performance_benchmark::BenchmarkStatistics;
pub use performance_benchmark::BenchmarkSummary;
pub use performance_benchmark::ComparisonResult;
pub use performance_benchmark::PerformanceBenchmark;
pub use performance_benchmark::RegressionReport;
pub use performance_benchmark::RegressionSeverity;
pub use performance_benchmark::ThroughputMetrics;
pub use simd_grad_ops_simple::global_simd_grad_ops;
pub use simd_grad_ops_simple::SimdGradConfig;
pub use simd_grad_ops_simple::SimdGradOps;
pub use simd_grad_ops_simple::SimdPerformanceMetrics;
pub use special_functions::bessel_j0_backward;
pub use special_functions::bessel_j1_backward;
pub use special_functions::beta_backward;
pub use special_functions::digamma_backward;
pub use special_functions::erf_backward;
pub use special_functions::erfc_backward;
pub use special_functions::gamma_backward;
pub use special_functions::lgamma_backward;
pub use subgraph_extraction::ExtractionStrategy;
pub use subgraph_extraction::Subgraph;
pub use subgraph_extraction::SubgraphConfig;
pub use subgraph_extraction::SubgraphExtractionResult;
pub use subgraph_extraction::SubgraphExtractor;
pub use subgraph_extraction::SubgraphOperation;
pub use tape::GradientTape;
pub use tape::Operation;
pub use tape::TapeNode;
pub use tape::TrackedTensor;
pub use tape_optimization::TapeOptimizationConfig;
pub use tape_optimization::TapeOptimizationStats;
pub use tape_optimization::TapeOptimizer;
pub use tensor_ext::TensorAutograd;
pub use tensor_networks::ContractionEdge;
pub use tensor_networks::ContractionPath;
pub use tensor_networks::ContractionStep;
pub use tensor_networks::ContractionStrategy;
pub use tensor_networks::TensorNetwork;
pub use tensor_networks::TensorNetworkGradient;
pub use tensor_networks::TensorNetworkNode;
pub use tensor_networks::TensorNetworkOptimizer;
pub use ultra_gradient_engine_simple::global_ultra_gradient_engine;
pub use ultra_gradient_engine_simple::GradientMemoryStats;
pub use ultra_gradient_engine_simple::GradientPerformanceMetrics;
pub use ultra_gradient_engine_simple::OptimizationInsights;
pub use ultra_gradient_engine_simple::UltraGradientConfig;
pub use ultra_gradient_engine_simple::UltraGradientEngine;
pub use ultra_gradient_engine_simple::UltraGradientResult;
pub use ultra_gradient_engine_simple::UltraGradientTapeExt;
pub use advanced_grad_ops::gradient_clipping;
pub use advanced_grad_ops::higher_order as advanced_higher_order;
pub use advanced_grad_ops::jacobian;
pub use advanced_grad_ops::optimization;
pub use advanced_grad_ops::AdaptiveGradientAccumulator;
pub use amp_policy::AMPConfig;
pub use amp_policy::AMPPolicy;
pub use amp_policy::AMPStabilityMetrics;
pub use amp_policy::ScaleAdjustment;
pub use amp_policy::ScaleAdjustmentReason;
pub use efficient_memory::AggregationStats;
pub use efficient_memory::CheckpointStats;
pub use efficient_memory::GradientCheckpointer;
pub use efficient_memory::GradientMemoryManager;
pub use efficient_memory::GradientMemoryPool;
pub use efficient_memory::LazyGradient;
pub use efficient_memory::MemoryManagerStats;
pub use efficient_memory::MemoryPoolStats;
pub use efficient_memory::StreamingGradientAggregator;
pub use gradient_analyzer::AnalysisConfig;
pub use gradient_analyzer::GradientAnalysisReport;
pub use gradient_analyzer::GradientAnalyzer;
pub use gradient_analyzer::GradientFlowAnalysis as AdvancedGradientFlowAnalysis;
pub use gradient_analyzer::GradientIssue;
pub use gradient_analyzer::GradientStatistics as AnalyzerGradientStatistics;
pub use gradient_analyzer::PerformanceMetrics;
pub use second_order_utils::compute_hessian;
pub use second_order_utils::compute_hessian_diagonal;
pub use second_order_utils::compute_jacobian;
pub use second_order_utils::compute_laplacian;
pub use second_order_utils::directional_second_derivative;
pub use second_order_utils::hessian_vector_product;

Modules§

advanced_grad_ops
Advanced Gradient Operations
amp_policy
Automatic Mixed Precision (AMP) Policy for Gradient Computation
boolean_indexing
checkpointing
context
coverage_matrix
Gradient Coverage Matrix Generator
custom_gradients
debug
deterministic
Deterministic Execution Mode
device_placement
Device Placement Optimization
efficient_memory
Memory-Efficient Gradient Computation
ellipsis_newaxis
error_taxonomy
Error Taxonomy Alignment for Autograd
forward_ad
forward_reverse
Forward-Reverse Mode Integration
global_pooling
Global pooling gradient operations
gpu_gradient_expansion
GPU Gradient Expansion Strategy
grad_ops
Gradient operations module
gradient_accumulation
gradient_analyzer
Gradient Analysis and Debugging Tools
gradient_buffer_manager_simple
Simplified Gradient Buffer Manager for Ultra-High-Performance Memory Management
gradient_compression
gradient_compression_advanced
Advanced Gradient Compression for Distributed Training
gradient_ops
Gradient Operation Utilities
gradient_utils
Gradient Computation Utilities
gradient_visualization
Gradient Flow Visualization
graph_optimization
Enhanced Graph Optimization Framework
higher_order
Higher-order derivative computation
hybrid_scheduler
Hybrid Differentiation Scheduler
implicit_differentiation
Implicit Differentiation
inplace_ops
jit_compiler
JIT Compilation System for Gradient Kernels
jit_integration
JIT Compiler Integration with Autograd System
kernel_fusion
memory_diff_reporter
Memory Diff Reporter
memory_profiler
neural_integration
Neural network integration layer for autograd
no_grad
No-gradient context for disabling gradient computation
numerical_checker
Numerical Gradient Checker
ops
Gradient operation modules
parameter_server
Parameter server for distributed training.
performance_benchmark
Performance Benchmarking Framework for Autograd
second_order
Second-Order Gradient Methods
second_order_utils
Second-Order Derivative Utilities
simd_grad_ops_simple
Simplified SIMD-Accelerated Gradient Operations for Ultra-High-Performance
special_functions
Gradient operations for special mathematical functions
subgraph_extraction
Subgraph Extraction and Parallelization
tape
Automatic differentiation tape for gradient computation
tape_optimization
tensor_ext
tensor_networks
Tensor Network Gradients
ultra_gradient
Ultra-High-Performance Gradient Computation Engine
ultra_gradient_engine_simple
Simplified Ultra-High-Performance Gradient Computation Engine

Macros§

enable_grad
Macro for enable-grad context (convenience)
no_grad
Macro for no-grad context (convenience)
profile_gradient_op
Convenience macro for profiling gradient operations

Traits§

Differentiable