Skip to main content

KernelStats

Struct KernelStats 

Source
pub struct KernelStats {
Show 25 fields pub total_instructions: usize, pub ld_global: usize, pub st_global: usize, pub ld_shared: usize, pub st_shared: usize, pub bar_sync: usize, pub mma: usize, pub ldmatrix: usize, pub cp_async: usize, pub cp_async_commit: usize, pub cp_async_wait: usize, pub fma: usize, pub arith_other: usize, pub mov: usize, pub cvt: usize, pub branches: usize, pub setp: usize, pub registers_r: u32, pub registers_rd: u32, pub registers_f: u32, pub registers_fd: u32, pub registers_p: u32, pub registers_h: u32, pub registers_hb: u32, pub shared_bytes: u32,
}
Expand description

Structural statistics about a compiled kernel’s emitted PTX.

These describe the instruction mix and declared resource usage in KAIO’s generated PTX — useful for inspection and comparison between kernel variants, but not a substitute for runtime profiling. Final hardware register allocation and occupancy may differ from these counts after the CUDA driver’s backend compilation (PTX → SASS).

Fields§

§total_instructions: usize

Total instructions (excludes labels and comments).

§ld_global: usize

ld.global count.

§st_global: usize

st.global count.

§ld_shared: usize

ld.shared count.

§st_shared: usize

st.shared count.

§bar_sync: usize

bar.sync count.

§mma: usize

mma.sync instruction count (all tensor-core shapes).

§ldmatrix: usize

ldmatrix instruction count (warp-collective fragment loads — tracked apart from ld_shared so loader-rewire instruction-mix shifts stay visible).

§cp_async: usize

cp.async.ca.shared.global instruction count.

§cp_async_commit: usize

cp.async.commit_group instruction count.

§cp_async_wait: usize

cp.async.wait_group instruction count.

§fma: usize

fma instruction count.

§arith_other: usize

Non-FMA arithmetic instructions (add, mul, sub, etc.).

§mov: usize

mov instruction count.

§cvt: usize

cvt instruction count.

§branches: usize

Branch instructions (bra, @pred bra).

§setp: usize

setp comparison-to-predicate instructions.

§registers_r: u32

%r registers (32-bit integer).

§registers_rd: u32

%rd registers (64-bit integer).

§registers_f: u32

%f registers (f32).

§registers_fd: u32

%fd registers (f64).

§registers_p: u32

%p registers (predicate).

§registers_h: u32

%h registers (f16).

§registers_hb: u32

%hb registers (bf16).

§shared_bytes: u32

Total declared shared memory in bytes.

Trait Implementations§

Source§

impl Debug for KernelStats

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for KernelStats

Source§

fn default() -> KernelStats

Returns the “default value” for a type. Read more
Source§

impl Eq for KernelStats

Source§

impl PartialEq for KernelStats

Source§

fn eq(&self, other: &KernelStats) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for KernelStats

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.