Expand description
Advanced GPU Tensor Core utilization for spatial algorithms
This module provides cutting-edge implementations that leverage modern GPU tensor cores (NVIDIA’s Tensor Cores, AMD’s Matrix Cores, Intel’s XMX units) for maximum performance in spatial computing. It includes mixed-precision operations, automatic layout optimization, and hardware-specific kernel selection for optimal throughput.
§Features
- Tensor Core acceleration for matrix operations in spatial algorithms
- Mixed-precision computing (FP16, BF16, INT8, INT4) for maximum throughput
- Automatic tensor layout optimization for memory coalescing
- Hierarchical tiling strategies for large datasets
- Multi-GPU tensor parallelism for distributed spatial computation
- Dynamic precision selection based on numerical stability requirements
- Fused kernel operations to minimize memory bandwidth
- Async execution pipelines for maximum GPU utilization
§Supported Hardware
- NVIDIA: V100, A100, H100, RTX 30/40 series (Tensor Cores)
- AMD: MI250X, MI300 series (Matrix Cores)
- Intel: Ponte Vecchio, Arc GPUs (XMX units)
- Automatic fallback to standard compute units when tensor cores unavailable
§Examples
use scirs2_spatial::tensor_cores::{TensorCoreDistanceMatrix, TensorCoreClustering, PrecisionMode};
use scirs2_core::ndarray::array;
// Tensor core distance matrix computation
let points = array![[0.0, 0.0], [1.0, 0.0], [0.0, 1.0], [1.0, 1.0]];
let mut tensor_matrix = TensorCoreDistanceMatrix::new()?
.with_precision_mode(PrecisionMode::Mixed16)
.with_tensor_layout_optimization(true)
.with_hierarchical_tiling(true);
let distances = tensor_matrix.compute_parallel(&points.view()).await?;
println!("Tensor core distance matrix: {:?}", distances);
// Tensor core k-means clustering
let mut tensor_kmeans = TensorCoreClustering::new(2)?
.with_tensor_cores(true)
.with_mixed_precision(true)
.with_dynamic_precision_scaling(true);
let (centroids, assignments) = tensor_kmeans.fit(&points.view()).await?;
println!("Tensor core centroids: {:?}", centroids);Structs§
- Advanced
Tensor Core Distance Matrix - Tensor core distance matrix computer with advanced stability monitoring
- Dynamic
Precision Config - Dynamic precision scaling configuration
- Error
Recovery System - Advanced error recovery system
- Numerical
Stability Monitor - Real-time numerical stability monitor
- Performance
Accuracy Analyzer - Performance-accuracy trade-off analyzer
- Recovery
Attempt - Recovery attempt record
- Stability
Metrics - Numerical stability metrics
- Tensor
Core Capabilities - Tensor core capabilities
- Tensor
Core Clustering - Tensor core clustering algorithm
- Tensor
Core Distance Matrix - Tensor core distance matrix computer
- Trade
OffParams - Trade-off optimization parameters
Enums§
- GpuArchitecture
- GPU architecture types
- Numerical
Error Type - Numerical error types
- Optimization
Objective - Optimization objectives
- Precision
Mode - Precision modes for tensor core operations
- Recovery
Action - Recovery action types
- Scaling
Strategy - Dynamic precision scaling strategy
- Stability
Level - Numerical stability level
- Tensor
Core Type - Tensor core types
- Tensor
Layout - Tensor layout optimization strategies
Functions§
- detect_
tensor_ core_ capabilities - Detect tensor core capabilities of available GPU hardware