Struct hwlocality::cpu::cache::CpuCacheStats
source · pub struct CpuCacheStats { /* private fields */ }
Expand description
CPU cache statistics
These statistics can be used to perform simple cache locality optimizations when your performance requirements do not call for full locality-aware scheduling with manual task and memory pinning.
Implementations§
source§impl CpuCacheStats
impl CpuCacheStats
sourcepub fn new(topology: &Topology) -> Option<Self>
pub fn new(topology: &Topology) -> Option<Self>
Compute CPU cache statistics, if cache sizes are known
Returns None if cache size information is unavailable for at least some of the CPU caches on the system.
sourcepub fn smallest_data_cache_sizes(&self) -> &[u64]
pub fn smallest_data_cache_sizes(&self) -> &[u64]
Smallest CPU data cache capacity at each cache level
This tells you how many cache levels there are in the deepest cache hierarchy on this system, and what is the minimal cache capacity at each level.
You should tune sequential algorithms such that they fit this effective cache hierarchy (first layer of loop blocking has a working set that can stay in the first reported cache capacity, second layer of loop blocking has a working set that can fit in the second reported capacity, etc.)
sourcepub fn smallest_data_cache_sizes_per_thread(&self) -> &[u64]
pub fn smallest_data_cache_sizes_per_thread(&self) -> &[u64]
Smallest CPU data cache capacity at each cache level, per thread
This tells you how many cache levels there are in the deepest cache hierarchy on this system, and what is the minimal cache capacity per thread sharing a cache at each level.
In parallel algorithms where all CPU threads are potentially used, and threads effectively share no common data, you should tune the private working set of each thread such that it fits this effective cache hierarchy (first layer of loop blocking has a working set that can stay in the first reported cache capacity, second layer of loop blocking has a working set that can fit in the second reported capacity, etc.).
sourcepub fn total_data_cache_sizes(&self) -> &[u64]
pub fn total_data_cache_sizes(&self) -> &[u64]
Total CPU data cache capacity at each cache level
This tells you how many cache levels there are in the deepest cache hierarchy on this system, and what is the total cache capacity at each level.
You should tune parallel algorithms such that the total working set (summed across all threads without double-counting shared resources) fits in the reported aggregated cache capacities.
Beware that this is only a minimal requirement for cache locality, and programs honoring this criterion might still not achieve good cache performance due to CPU core heterogeneity or Non-Uniform Cache Access (NUCA) effects. To correctly handle these, you need to move to a fully locality-aware design with threads pinned to CPU cores and tree-like synchronization following the shape of the topology tree.
That being said, you may manage to reduce NUCA effects at the cost of using a smaller fraction of your CPU cache capacity by making your parallel algorithm collectively fit into the smallest last-level cache.
Trait Implementations§
source§impl Clone for CpuCacheStats
impl Clone for CpuCacheStats
source§fn clone(&self) -> CpuCacheStats
fn clone(&self) -> CpuCacheStats
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for CpuCacheStats
impl Debug for CpuCacheStats
source§impl Hash for CpuCacheStats
impl Hash for CpuCacheStats
source§impl PartialEq for CpuCacheStats
impl PartialEq for CpuCacheStats
source§fn eq(&self, other: &CpuCacheStats) -> bool
fn eq(&self, other: &CpuCacheStats) -> bool
self
and other
values to be equal, and is used
by ==
.