Module tensor_cores

Module tensor_cores 

Source
Expand description

Advanced GPU Tensor Core utilization for spatial algorithms

This module provides cutting-edge implementations that leverage modern GPU tensor cores (NVIDIA’s Tensor Cores, AMD’s Matrix Cores, Intel’s XMX units) for maximum performance in spatial computing. It includes mixed-precision operations, automatic layout optimization, and hardware-specific kernel selection for optimal throughput.

§Features

  • Tensor Core acceleration for matrix operations in spatial algorithms
  • Mixed-precision computing (FP16, BF16, INT8, INT4) for maximum throughput
  • Automatic tensor layout optimization for memory coalescing
  • Hierarchical tiling strategies for large datasets
  • Multi-GPU tensor parallelism for distributed spatial computation
  • Dynamic precision selection based on numerical stability requirements
  • Fused kernel operations to minimize memory bandwidth
  • Async execution pipelines for maximum GPU utilization

§Supported Hardware

  • NVIDIA: V100, A100, H100, RTX 30/40 series (Tensor Cores)
  • AMD: MI250X, MI300 series (Matrix Cores)
  • Intel: Ponte Vecchio, Arc GPUs (XMX units)
  • Automatic fallback to standard compute units when tensor cores unavailable

§Examples

use scirs2_spatial::tensor_cores::{TensorCoreDistanceMatrix, TensorCoreClustering, PrecisionMode};
use scirs2_core::ndarray::array;

// Tensor core distance matrix computation
let points = array![[0.0, 0.0], [1.0, 0.0], [0.0, 1.0], [1.0, 1.0]];

let mut tensor_matrix = TensorCoreDistanceMatrix::new()?
    .with_precision_mode(PrecisionMode::Mixed16)
    .with_tensor_layout_optimization(true)
    .with_hierarchical_tiling(true);

let distances = tensor_matrix.compute_parallel(&points.view()).await?;
println!("Tensor core distance matrix: {:?}", distances);

// Tensor core k-means clustering
let mut tensor_kmeans = TensorCoreClustering::new(2)?
    .with_tensor_cores(true)
    .with_mixed_precision(true)
    .with_dynamic_precision_scaling(true);

let (centroids, assignments) = tensor_kmeans.fit(&points.view()).await?;
println!("Tensor core centroids: {:?}", centroids);

Structs§

AdvancedTensorCoreDistanceMatrix
Tensor core distance matrix computer with advanced stability monitoring
DynamicPrecisionConfig
Dynamic precision scaling configuration
ErrorRecoverySystem
Advanced error recovery system
NumericalStabilityMonitor
Real-time numerical stability monitor
PerformanceAccuracyAnalyzer
Performance-accuracy trade-off analyzer
RecoveryAttempt
Recovery attempt record
StabilityMetrics
Numerical stability metrics
TensorCoreCapabilities
Tensor core capabilities
TensorCoreClustering
Tensor core clustering algorithm
TensorCoreDistanceMatrix
Tensor core distance matrix computer
TradeOffParams
Trade-off optimization parameters

Enums§

GpuArchitecture
GPU architecture types
NumericalErrorType
Numerical error types
OptimizationObjective
Optimization objectives
PrecisionMode
Precision modes for tensor core operations
RecoveryAction
Recovery action types
ScalingStrategy
Dynamic precision scaling strategy
StabilityLevel
Numerical stability level
TensorCoreType
Tensor core types
TensorLayout
Tensor layout optimization strategies

Functions§

detect_tensor_core_capabilities
Detect tensor core capabilities of available GPU hardware