Expand description
Automatic kernel tuning for hardware adaptation
This module provides automatic performance tuning for kernel operations across different hardware backends. It profiles kernel execution times and adaptively selects optimal parameters (block sizes, thread counts, memory layouts) for the specific hardware being used.
§Features
- Auto-tuning: Automatic parameter selection through benchmarking
- Hardware Detection: Platform capability detection and profiling
- Caching: Persistent tuning results for faster subsequent runs
- Multi-Backend: Support for CUDA, ROCm, Metal, CPU, and more
- Adaptive: Dynamic adjustment based on tensor sizes and operations
§Examples
use trustformers_core::kernel_tuning::{KernelTuner, TuningConfig, Operation};
// Create tuner with default configuration
let mut tuner = KernelTuner::new(TuningConfig::default())?;
// Auto-tune matrix multiplication parameters for 1024x768 * 768x512
let params = tuner.tune_matmul(1024, 512, 768)?;
println!("Optimal block size: {:?}", params.block_size);Structs§
- Kernel
Params - Tuned kernel parameters
- Kernel
Tuner - Automatic kernel tuner
- Platform
Info - Platform characteristics for tuning decisions
- Tuning
Config - Tuning configuration
- Tuning
Statistics - Statistics about tuning results
Enums§
Functions§
- get_
kernel_ tuner - Get or initialize the global kernel tuner