pub struct CpuCacheStats { /* private fields */ }
Expand description

CPU cache statistics

These statistics can be used to perform simple cache locality optimizations when your performance requirements do not call for full locality-aware scheduling with manual task and memory pinning.

Implementations§

source§

impl CpuCacheStats

source

pub fn new(topology: &Topology) -> Option<Self>

Compute CPU cache statistics, if cache sizes are known

Returns None if cache size information is unavailable for at least some of the CPU caches on the system.

source

pub fn smallest_data_cache_sizes(&self) -> &[u64]

Smallest CPU data cache capacity at each cache level

This tells you how many cache levels there are in the deepest cache hierarchy on this system, and what is the minimal cache capacity at each level.

You should tune sequential algorithms such that they fit this effective cache hierarchy (first layer of loop blocking has a working set that can stay in the first reported cache capacity, second layer of loop blocking has a working set that can fit in the second reported capacity, etc.)

source

pub fn smallest_data_cache_sizes_per_thread(&self) -> &[u64]

Smallest CPU data cache capacity at each cache level, per thread

This tells you how many cache levels there are in the deepest cache hierarchy on this system, and what is the minimal cache capacity per thread sharing a cache at each level.

In parallel algorithms where all CPU threads are potentially used, and threads effectively share no common data, you should tune the private working set of each thread such that it fits this effective cache hierarchy (first layer of loop blocking has a working set that can stay in the first reported cache capacity, second layer of loop blocking has a working set that can fit in the second reported capacity, etc.).

source

pub fn total_data_cache_sizes(&self) -> &[u64]

Total CPU data cache capacity at each cache level

This tells you how many cache levels there are in the deepest cache hierarchy on this system, and what is the total cache capacity at each level.

You should tune parallel algorithms such that the total working set (summed across all threads without double-counting shared resources) fits in the reported aggregated cache capacities.

Beware that this is only a minimal requirement for cache locality, and programs honoring this criterion might still not achieve good cache performance due to CPU core heterogeneity or Non-Uniform Cache Access (NUCA) effects. To correctly handle these, you need to move to a fully locality-aware design with threads pinned to CPU cores and tree-like synchronization following the shape of the topology tree.

That being said, you may manage to reduce NUCA effects at the cost of using a smaller fraction of your CPU cache capacity by making your parallel algorithm collectively fit into the smallest last-level cache.

Trait Implementations§

source§

impl Clone for CpuCacheStats

source§

fn clone(&self) -> CpuCacheStats

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for CpuCacheStats

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Hash for CpuCacheStats

source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
source§

impl PartialEq for CpuCacheStats

source§

fn eq(&self, other: &CpuCacheStats) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for CpuCacheStats

source§

impl StructuralPartialEq for CpuCacheStats

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V