Module prefetch

Module prefetch 

Source
Expand description

Cross-platform prefetch intrinsics for x86 and ARM architectures.

This module provides a unified API for prefetching memory addresses across different architectures and cache levels. It uses compile-time generics to eliminate runtime branches and provide direct intrinsic calls.

§Why u8/i8 pointers?

Prefetch instructions operate at the cache line level (typically 64 bytes) and don’t care about the actual data type being prefetched. They only need a memory address. Using u8/i8 pointers is the standard convention because:

  • Prefetch works on cache lines, not individual data elements
  • The CPU prefetches entire cache lines regardless of data type
  • u8 provides byte-level addressing which is what the hardware expects

§Examples

use simd_lookup::prefetch::{prefetch_eight_offsets, prefetch_eight_masked, L1, NTA};

let data = vec![0u32; 1000];
let offsets = [10, 20, 30, 40, 50, 60, 70, 80];

// Prefetch 8 addresses for L1 cache
prefetch_eight_offsets::<_, L1>(&data, &offsets);

// Prefetch with mask - only prefetch where mask bit is 1
let mask = 0b10101010; // prefetch offsets[1], [3], [5], [7]
prefetch_eight_masked::<_, L1>(&data, offsets, mask);

Structs§

L1
L1 cache prefetch
L2
L2 cache prefetch
L3
L3 cache prefetch
NTA
Non-temporal access - bypass cache hierarchy

Traits§

CacheLevel
Cache level marker traits for compile-time dispatch

Functions§

prefetch_address
Prefetch a single memory address for the specified cache level
prefetch_eight_masked
Prefetch eight addresses with a bitmask to control which addresses to prefetch
prefetch_eight_offsets
Prefetch eight memory addresses at once using offsets from a base pointer