pub struct RooflineModel {
pub target: String,
pub peak_compute: HashMap<Precision, f64>,
pub peak_bandwidth: HashMap<MemoryLevel, f64>,
}Expand description
Roofline model for a specific hardware target. Implements the Empirical Roofline Toolkit (ERT) methodology [6].
Fields§
§target: StringHardware target name (e.g., “RTX 4090”, “AMD EPYC AVX2”)
peak_compute: HashMap<Precision, f64>Peak compute throughput (FLOP/s) per precision
peak_bandwidth: HashMap<MemoryLevel, f64>Peak memory bandwidth (bytes/s) per memory level
Implementations§
Source§impl RooflineModel
impl RooflineModel
Sourcepub fn ridge_point(
&self,
precision: Precision,
mem_level: MemoryLevel,
) -> Option<f64>
pub fn ridge_point( &self, precision: Precision, mem_level: MemoryLevel, ) -> Option<f64>
Compute the ridge point for a given precision and memory level. Ridge = peak_compute / peak_bandwidth (FLOP/byte). This is the arithmetic intensity where the kernel transitions from memory-bound to compute-bound.
Sourcepub fn theoretical_peak(
&self,
arithmetic_intensity: f64,
precision: Precision,
mem_level: MemoryLevel,
) -> Option<f64>
pub fn theoretical_peak( &self, arithmetic_intensity: f64, precision: Precision, mem_level: MemoryLevel, ) -> Option<f64>
Compute the theoretical peak throughput at a given arithmetic intensity. throughput = min(peak_compute, AI * peak_bandwidth)
Sourcepub fn classify(
&self,
arithmetic_intensity: f64,
achieved_throughput: f64,
precision: Precision,
mem_level: MemoryLevel,
) -> Option<KernelRooflinePoint>
pub fn classify( &self, arithmetic_intensity: f64, achieved_throughput: f64, precision: Precision, mem_level: MemoryLevel, ) -> Option<KernelRooflinePoint>
Classify a kernel as compute-bound or memory-bound.
Sourcepub fn cpu_avx2(freq_ghz: f64, cores: usize, mem_bandwidth_gbps: f64) -> Self
pub fn cpu_avx2(freq_ghz: f64, cores: usize, mem_bandwidth_gbps: f64) -> Self
Create a CPU AVX2+FMA roofline model. Assumes dual 256-bit FMA units (e.g., AMD EPYC / Intel Skylake).
Sourcepub fn cpu_avx512(freq_ghz: f64, cores: usize, mem_bandwidth_gbps: f64) -> Self
pub fn cpu_avx512(freq_ghz: f64, cores: usize, mem_bandwidth_gbps: f64) -> Self
Create a CPU AVX-512 roofline model. AVX-512: 2 FMA units * 16 floats * 2 (FMA) * freq * cores.
Trait Implementations§
Source§impl Clone for RooflineModel
impl Clone for RooflineModel
Source§fn clone(&self) -> RooflineModel
fn clone(&self) -> RooflineModel
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for RooflineModel
impl Debug for RooflineModel
Source§impl<'de> Deserialize<'de> for RooflineModel
impl<'de> Deserialize<'de> for RooflineModel
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Auto Trait Implementations§
impl Freeze for RooflineModel
impl RefUnwindSafe for RooflineModel
impl Send for RooflineModel
impl Sync for RooflineModel
impl Unpin for RooflineModel
impl UnsafeUnpin for RooflineModel
impl UnwindSafe for RooflineModel
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more