Expand description
Cross-platform prefetch intrinsics for x86 and ARM architectures.
This module provides a unified API for prefetching memory addresses across different architectures and cache levels. It uses compile-time generics to eliminate runtime branches and provide direct intrinsic calls.
§Why u8/i8 pointers?
Prefetch instructions operate at the cache line level (typically 64 bytes) and don’t care about the actual data type being prefetched. They only need a memory address. Using u8/i8 pointers is the standard convention because:
- Prefetch works on cache lines, not individual data elements
- The CPU prefetches entire cache lines regardless of data type
- u8 provides byte-level addressing which is what the hardware expects
§Examples
use simd_lookup::prefetch::{prefetch_eight_offsets, prefetch_eight_masked, L1, NTA};
let data = vec![0u32; 1000];
let offsets = [10, 20, 30, 40, 50, 60, 70, 80];
// Prefetch 8 addresses for L1 cache
prefetch_eight_offsets::<_, L1>(&data, &offsets);
// Prefetch with mask - only prefetch where mask bit is 1
let mask = 0b10101010; // prefetch offsets[1], [3], [5], [7]
prefetch_eight_masked::<_, L1>(&data, offsets, mask);Structs§
- L1
- L1 cache prefetch
- L2
- L2 cache prefetch
- L3
- L3 cache prefetch
- NTA
- Non-temporal access - bypass cache hierarchy
Traits§
- Cache
Level - Cache level marker traits for compile-time dispatch
Functions§
- prefetch_
address - Prefetch a single memory address for the specified cache level
- prefetch_
eight_ masked - Prefetch eight addresses with a bitmask to control which addresses to prefetch
- prefetch_
eight_ offsets - Prefetch eight memory addresses at once using offsets from a base pointer