pub struct KernelStats {Show 24 fields
pub total_instructions: usize,
pub ld_global: usize,
pub st_global: usize,
pub ld_shared: usize,
pub st_shared: usize,
pub bar_sync: usize,
pub mma: usize,
pub cp_async: usize,
pub cp_async_commit: usize,
pub cp_async_wait: usize,
pub fma: usize,
pub arith_other: usize,
pub mov: usize,
pub cvt: usize,
pub branches: usize,
pub setp: usize,
pub registers_r: u32,
pub registers_rd: u32,
pub registers_f: u32,
pub registers_fd: u32,
pub registers_p: u32,
pub registers_h: u32,
pub registers_hb: u32,
pub shared_bytes: u32,
}Expand description
Structural statistics about a compiled kernel’s emitted PTX.
These describe the instruction mix and declared resource usage in KAIO’s generated PTX — useful for inspection and comparison between kernel variants, but not a substitute for runtime profiling. Final hardware register allocation and occupancy may differ from these counts after the CUDA driver’s backend compilation (PTX → SASS).
Fields§
§total_instructions: usizeTotal instructions (excludes labels and comments).
ld_global: usizeld.global count.
st_global: usizest.global count.
ld.shared count.
st.shared count.
bar_sync: usizebar.sync count.
mma: usizemma.sync instruction count (all tensor-core shapes).
cp_async: usizecp.async.ca.shared.global instruction count.
cp_async_commit: usizecp.async.commit_group instruction count.
cp_async_wait: usizecp.async.wait_group instruction count.
fma: usizefma instruction count.
arith_other: usizeNon-FMA arithmetic instructions (add, mul, sub, etc.).
mov: usizemov instruction count.
cvt: usizecvt instruction count.
branches: usizeBranch instructions (bra, @pred bra).
setp: usizesetp comparison-to-predicate instructions.
registers_r: u32%r registers (32-bit integer).
registers_rd: u32%rd registers (64-bit integer).
registers_f: u32%f registers (f32).
registers_fd: u32%fd registers (f64).
registers_p: u32%p registers (predicate).
registers_h: u32%h registers (f16).
registers_hb: u32%hb registers (bf16).
Total declared shared memory in bytes.