Expand description
Advanced GPU profiling and kernel optimization tools
This module provides comprehensive GPU memory analysis, kernel optimization suggestions, and advanced profiling capabilities for CUDA/ROCm/OpenCL kernels.
Structs§
- Access
Locality Metrics - Advanced
GpuMemory Profiler - Advanced GPU memory profiler with fragmentation analysis
- Advanced
GpuProfiling Config - Configuration for advanced GPU profiling
- Allocation
Context - Allocation
HotSpot - Allocation
Pattern Summary - Arithmetic
Intensity Analyzer - Arithmetic
Intensity Profile - Balancing
Strategy - Bandwidth
Sample - Bandwidth
Summary - Bank
Conflict Analyzer - Bank
Conflict Pattern - Bottleneck
Factor - Cache
Optimization - Cache
Performance Analysis - Coalescing
Analysis - Coalescing
Improvement - Compute
Bottleneck Analysis - Compute
Optimization Opportunity - Compute
Utilization Analyzer - Compute utilization analysis
- Compute
Utilization Profile - Config
Performance Measurement - Conflict
Resolution Strategy - Cross
Device Transfer - Cross-device memory transfer tracking
- Cross
Device Transfer Summary - Detected
Stride - Expected
Benefit - Expected
Improvement - Fragmentation
Summary - GpuBandwidth
Monitor - GPU bandwidth monitoring
- GpuMemory
Allocation - GPU memory allocation with detailed tracking
- High
Impact Optimization - Instruction
MixAnalysis - Kernel
Execution Profile - Kernel
Optimization - Kernel
Optimization Summary Report - Summary report for kernel optimization
- Launch
Config Analyzer - Launch configuration analysis
- Launch
Config Search Space - Memory
Access Analysis - Memory
Access Analyzer - Memory access pattern analysis
- Memory
Access Pattern - Memory
Analysis Report - Memory
Fragmentation Snapshot - Memory fragmentation analysis
- Memory
Optimization Recommendation - Memory
Pressure Monitor - Memory pressure monitoring
- Memory
Pressure Snapshot - Memory
Pressure Summary - Memory
Pressure Thresholds - Memory
Usage Stats - Optimal
Launch Config - Resource
Balancer - Resource
Profile - Resource
Utilization Metrics - Roofline
Model - Stride
Analysis Result - Sustained
Bandwidth Measurement - Transfer
Bottleneck - Uncoalesced
Region
Enums§
- Allocation
Source - Cache
Optimization Type - Coalescing
Improvement Type - Compute
Bottleneck Type - Compute
Optimization Type - Conflict
Severity - Cross
Device Transfer Type - Fragmentation
Trend - GpuMemory
Type - Implementation
Difficulty - Launch
Constraint - Limiting
Factor - Memory
Operation Type - Memory
Optimization Type - Memory
Pressure Level - Optimization
Priority - Optimization
Type - Optimization
Value - Pressure
Trend - Resolution
Strategy Type - Stride
Pattern - Transfer
Bottleneck Type