Skip to main content

KernelStats

Struct KernelStats 

Source
pub struct KernelStats {
Show 24 fields pub total_instructions: usize, pub ld_global: usize, pub st_global: usize, pub ld_shared: usize, pub st_shared: usize, pub bar_sync: usize, pub mma: usize, pub cp_async: usize, pub cp_async_commit: usize, pub cp_async_wait: usize, pub fma: usize, pub arith_other: usize, pub mov: usize, pub cvt: usize, pub branches: usize, pub setp: usize, pub registers_r: u32, pub registers_rd: u32, pub registers_f: u32, pub registers_fd: u32, pub registers_p: u32, pub registers_h: u32, pub registers_hb: u32, pub shared_bytes: u32,
}
Expand description

Structural statistics about a compiled kernel’s emitted PTX.

These describe the instruction mix and declared resource usage in KAIO’s generated PTX — useful for inspection and comparison between kernel variants, but not a substitute for runtime profiling. Final hardware register allocation and occupancy may differ from these counts after the CUDA driver’s backend compilation (PTX → SASS).

Fields§

§total_instructions: usize

Total instructions (excludes labels and comments).

§ld_global: usize

ld.global count.

§st_global: usize

st.global count.

§ld_shared: usize

ld.shared count.

§st_shared: usize

st.shared count.

§bar_sync: usize

bar.sync count.

§mma: usize

mma.sync instruction count (all tensor-core shapes).

§cp_async: usize

cp.async.ca.shared.global instruction count.

§cp_async_commit: usize

cp.async.commit_group instruction count.

§cp_async_wait: usize

cp.async.wait_group instruction count.

§fma: usize

fma instruction count.

§arith_other: usize

Non-FMA arithmetic instructions (add, mul, sub, etc.).

§mov: usize

mov instruction count.

§cvt: usize

cvt instruction count.

§branches: usize

Branch instructions (bra, @pred bra).

§setp: usize

setp comparison-to-predicate instructions.

§registers_r: u32

%r registers (32-bit integer).

§registers_rd: u32

%rd registers (64-bit integer).

§registers_f: u32

%f registers (f32).

§registers_fd: u32

%fd registers (f64).

§registers_p: u32

%p registers (predicate).

§registers_h: u32

%h registers (f16).

§registers_hb: u32

%hb registers (bf16).

§shared_bytes: u32

Total declared shared memory in bytes.

Trait Implementations§

Source§

impl Debug for KernelStats

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for KernelStats

Source§

fn default() -> KernelStats

Returns the “default value” for a type. Read more
Source§

impl PartialEq for KernelStats

Source§

fn eq(&self, other: &KernelStats) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for KernelStats

Source§

impl StructuralPartialEq for KernelStats

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.