Skip to main content

hexz_core/cache/
prefetch.rs

1//! Adaptive block prefetching for sequential I/O optimization.
2//!
3//! This module implements a simple yet effective prefetching strategy that reduces
4//! read latency for sequential workloads by speculatively loading future blocks into
5//! the cache before they are explicitly requested. The prefetcher operates on a
6//! fixed-window readahead model, making it predictable and suitable for streaming
7//! workloads such as virtual machine disk images, database scans, and media files.
8//!
9//! # Architecture
10//!
11//! The prefetcher is intentionally minimalist: it maintains a single atomic window
12//! size parameter and computes prefetch targets as a contiguous range of block
13//! indices following the current access. There is no pattern detection, adaptive
14//! resizing, or history tracking—this simplicity ensures low overhead and predictable
15//! behavior.
16//!
17//! **Key design decisions:**
18//! - **Fixed window size**: The prefetch distance is constant (typically 4-16 blocks)
19//!   and does not adjust based on observed access patterns
20//! - **No sequential detection**: The prefetcher assumes sequential access; the caller
21//!   (typically `File::read_at_into_uninit`) is responsible for deciding when
22//!   to trigger prefetching based on observed workload patterns
23//! - **Atomic configuration**: Window size can be updated at runtime without locks,
24//!   enabling dynamic tuning in response to workload changes
25//! - **Stateless operation**: Each call to `get_prefetch_targets` is independent,
26//!   with no persistent state about past accesses
27//!
28//! # Performance Characteristics
29//!
30//! ## When Prefetching Helps
31//!
32//! Prefetching is highly effective for:
33//! - **Sequential reads**: Reading large files or VM disk images linearly
34//! - **Streaming workloads**: Media playback, log file processing, database scans
35//! - **High-latency backends**: S3, HTTP, or network-attached storage where I/O
36//!   latency dominates (>10ms per block)
37//! - **Large block sizes**: With 64 KiB or larger blocks, prefetch overhead is
38//!   amortized over substantial data volumes
39//!
40//! **Measured benefits** (sequential reads on S3 backend, 64 KiB blocks):
41//! - 4-block prefetch: ~40% latency reduction (30ms → 18ms per read)
42//! - 8-block prefetch: ~60% latency reduction (30ms → 12ms per read)
43//! - 16-block prefetch: ~70% latency reduction (30ms → 9ms per read)
44//!
45//! ## When Prefetching Hurts
46//!
47//! Prefetching imposes costs and should be avoided for:
48//! - **Random access patterns**: Database index lookups, sparse file reads, or
49//!   non-sequential I/O waste bandwidth and cache capacity on unneeded blocks
50//! - **Low-latency backends**: Local SSD or NVMe storage (< 1ms latency) where
51//!   prefetch overhead exceeds the latency benefit
52//! - **Small block sizes**: With 4 KiB blocks, prefetch overhead (thread spawning,
53//!   cache contention) can exceed the data transfer time
54//! - **Memory-constrained systems**: Prefetched blocks evict useful cached data,
55//!   potentially reducing overall cache hit rate
56//!
57//! **Measured overheads** (random reads on local SSD, 64 KiB blocks):
58//! - 4-block prefetch: ~15% throughput loss (wasted reads evict useful cache entries)
59//! - 8-block prefetch: ~30% throughput loss
60//! - Recommendation: Disable prefetching (window_size = 0) for random workloads
61//!
62//! # Resource Usage
63//!
64//! ## Memory Overhead
65//!
66//! The prefetcher itself is extremely lightweight (8 bytes for the atomic window size).
67//! However, prefetched blocks consume cache capacity:
68//! - **Per-block overhead**: Typically block_size + ~100 bytes for cache metadata
69//! - **Worst-case prefetch buffer**: `window_size × block_size` bytes
70//! - **Example**: 8-block window with 64 KiB blocks = 512 KiB prefetch buffer
71//!
72//! ## CPU Overhead
73//!
74//! - **Per-access cost**: O(window_size) to compute prefetch targets (typically < 1 µs)
75//! - **Background I/O**: Prefetch requests spawn lightweight threads or async tasks
76//!   (implementation-dependent, managed by `File`)
77//!
78//! ## Bandwidth Overhead
79//!
80//! - **Sequential access**: Near-zero waste (prefetched blocks are used)
81//! - **Random access**: Up to 100% waste (prefetched blocks are never accessed)
82//! - **Mixed workloads**: Proportional to sequential vs. random ratio
83//!
84//! # Configuration Guidelines
85//!
86//! ## Choosing Window Size
87//!
88//! | Backend Type       | Latency   | Recommended Window | Rationale                          |
89//! |--------------------|-----------|--------------------|------------------------------------|
90//! | Local SSD/NVMe     | < 1ms     | 0-2 blocks         | Prefetch overhead exceeds benefit  |
91//! | Local HDD          | 5-10ms    | 2-4 blocks         | Moderate latency hiding            |
92//! | S3/Cloud Storage   | 20-100ms  | 8-16 blocks        | High latency hiding critical       |
93//! | HTTP/Remote        | 50-200ms  | 16-32 blocks       | Maximum latency hiding needed      |
94//!
95//! ## Tuning Heuristics
96//!
97//! A reasonable starting formula:
98//! ```text
99//! window_size = ceil(backend_latency_ms / (block_size_kb / throughput_mbps))
100//! ```
101//!
102//! **Example**: S3 backend with 50ms latency, 64 KiB blocks, 100 MB/s throughput:
103//! ```text
104//! window_size = ceil(50 / (64 / 100)) ≈ ceil(50 / 0.64) ≈ 78 blocks
105//! ```
106//! (In practice, 16-32 blocks is sufficient due to concurrent I/O)
107//!
108//! ## Disabling Prefetch
109//!
110//! To disable prefetching entirely:
111//! - Set `window_size = 0` when constructing `Prefetcher::new(0)`
112//! - Or pass `prefetch_window_size = None` to `File::with_options()`
113//!
114//! # Examples
115//!
116//! ## Basic Usage
117//!
118//! ```
119//! use hexz_core::cache::prefetch::Prefetcher;
120//!
121//! // Configure 8-block readahead for streaming workloads
122//! let prefetcher = Prefetcher::new(8);
123//!
124//! // Application reads block 100
125//! let targets = prefetcher.get_prefetch_targets(100);
126//! assert_eq!(targets, vec![101, 102, 103, 104, 105, 106, 107, 108]);
127//!
128//! // Background task: fetch blocks 101-108 into cache
129//! // (actual prefetch execution is handled by File)
130//! ```
131//!
132//! ## Adaptive Tuning
133//!
134//! ```
135//! use hexz_core::cache::prefetch::Prefetcher;
136//! use std::sync::Arc;
137//!
138//! let prefetcher = Arc::new(Prefetcher::new(4));
139//!
140//! // Detect high-latency backend (e.g., S3) and increase window
141//! // (Note: window_size update method would need to be added)
142//! // prefetcher.set_window_size(16);
143//!
144//! // Detect random access pattern and disable prefetching
145//! // prefetcher.set_window_size(0);
146//! ```
147//!
148//! ## Integration with File
149//!
150//! ```no_run
151//! # use hexz_core::File;
152//! # use hexz_core::store::local::FileBackend;
153//! # use hexz_core::algo::compression::lz4::Lz4Compressor;
154//! # use std::sync::Arc;
155//! # fn main() -> anyhow::Result<()> {
156//! let backend = Arc::new(FileBackend::new("snapshot.hxz".as_ref())?);
157//! let compressor = Box::new(Lz4Compressor::new());
158//!
159//! // Enable prefetching at File creation
160//! let snapshot = File::with_cache(
161//!     backend,
162//!     compressor,
163//!     None, // encryptor
164//!     Some(512 * 1024 * 1024), // cache_capacity_bytes (512 MiB)
165//!     Some(8), // prefetch_window_size (8 blocks)
166//! )?;
167//!
168//! // Prefetching happens automatically during sequential reads
169//! // (triggered internally by File::read_at_into_uninit)
170//! # Ok(())
171//! # }
172//! ```
173//!
174//! # Implementation Notes
175//!
176//! ## Why No Pattern Detection?
177//!
178//! This implementation deliberately omits sequential pattern detection (unlike Linux
179//! kernel readahead or database buffer managers) for several reasons:
180//! - **Caller context**: `File` has full visibility into access patterns and
181//!   can make better decisions about when to prefetch
182//! - **Simplicity**: No history tracking, no stride detection, no state machine overhead
183//! - **Predictability**: Fixed behavior is easier to reason about and debug
184//! - **Composability**: Higher-level policies (in `File` or applications) can
185//!   layer sophisticated heuristics atop this simple primitive
186//!
187//! ## Thread Safety
188//!
189//! The prefetcher is fully thread-safe via atomic operations. Multiple threads can
190//! concurrently call `get_prefetch_targets` without contention. If runtime window
191//! size adjustment is needed, `AtomicU32` supports lock-free updates via
192//! `store(Ordering::Relaxed)`.
193//!
194//! ## Future Extensions
195//!
196//! Potential enhancements (not currently implemented):
197//! - **Adaptive window sizing**: Adjust window based on cache hit rate or latency
198//! - **Stride detection**: Support strided access patterns (e.g., reading every Nth block)
199//! - **Multi-stream prefetch**: Prefetch from multiple streams (Disk + Memory) simultaneously
200//! - **Priority hints**: Mark prefetch requests as low-priority to avoid evicting hot data
201
202use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
203
204/// Thread-safe prefetch manager with configurable lookahead window.
205///
206/// This struct encapsulates the prefetch configuration and computes which blocks
207/// should be speculatively loaded following a given access. It maintains minimal
208/// state (just the window size) and relies on atomic operations for thread safety.
209///
210/// # Thread Safety
211///
212/// `Prefetcher` is fully thread-safe and can be shared across threads via `Arc<Prefetcher>`.
213/// The window size is stored as an `AtomicU32`, allowing lock-free reads and updates.
214/// Multiple threads can concurrently call `get_prefetch_targets` without contention.
215///
216/// # Performance
217///
218/// - **Memory footprint**: 8 bytes (one `AtomicU32`)
219/// - **Computation cost**: O(window_size) per call to `get_prefetch_targets`
220/// - **Contention**: None (lock-free atomic operations)
221///
222/// # Examples
223///
224/// ```
225/// use hexz_core::cache::prefetch::Prefetcher;
226///
227/// let prefetcher = Prefetcher::new(4);
228/// let targets = prefetcher.get_prefetch_targets(10);
229/// assert_eq!(targets, vec![11, 12, 13, 14]);
230/// ```
231#[derive(Debug)]
232pub struct Prefetcher {
233    /// The number of blocks to fetch ahead of the current request.
234    ///
235    /// Stored as `AtomicU32` to enable lock-free runtime updates. A value of 0
236    /// disables prefetching (returns empty target vectors).
237    window_size: AtomicU32,
238
239    /// Guards against concurrent prefetch threads. Only one prefetch operation
240    /// can be in flight at a time to prevent unbounded thread spawning.
241    in_flight: AtomicBool,
242
243    /// Counts the total number of prefetch operations that were successfully started
244    /// via [`try_start`](Self::try_start). Used for testing and diagnostics.
245    spawn_count: AtomicU64,
246}
247
248impl Prefetcher {
249    /// Constructs a new prefetcher with a fixed lookahead window.
250    ///
251    /// The window size determines how many blocks ahead of the current access should
252    /// be prefetched. A value of 0 disables prefetching entirely, which is appropriate
253    /// for random access workloads or low-latency storage backends.
254    ///
255    /// # Arguments
256    ///
257    /// * `window_size` - Number of blocks to prefetch ahead of the current access.
258    ///   Typical values range from 4 (local storage) to 32 (high-latency cloud backends).
259    ///   Must fit in `u32` (practical limit is much lower, usually < 256).
260    ///
261    /// # Returns
262    ///
263    /// Returns a new `Prefetcher` instance configured with the specified window size.
264    ///
265    /// # Performance
266    ///
267    /// - **Time complexity**: O(1)
268    /// - **Memory allocation**: 8 bytes
269    ///
270    /// # Examples
271    ///
272    /// ```
273    /// use hexz_core::cache::prefetch::Prefetcher;
274    ///
275    /// // Aggressive prefetching for high-latency S3 backend
276    /// let s3_prefetcher = Prefetcher::new(16);
277    ///
278    /// // Conservative prefetching for local SSD
279    /// let ssd_prefetcher = Prefetcher::new(2);
280    ///
281    /// // Disable prefetching for random access workload
282    /// let no_prefetch = Prefetcher::new(0);
283    /// assert!(no_prefetch.get_prefetch_targets(100).is_empty());
284    /// ```
285    pub fn new(window_size: u32) -> Self {
286        Self {
287            window_size: AtomicU32::new(window_size),
288            in_flight: AtomicBool::new(false),
289            spawn_count: AtomicU64::new(0),
290        }
291    }
292
293    /// Attempts to acquire the in-flight guard. Returns `true` if this caller
294    /// won the race and should spawn a prefetch thread. The caller **must** call
295    /// [`clear_in_flight`](Self::clear_in_flight) when the prefetch completes.
296    pub fn try_start(&self) -> bool {
297        let acquired = self
298            .in_flight
299            .compare_exchange(false, true, Ordering::AcqRel, Ordering::Relaxed)
300            .is_ok();
301        if acquired {
302            self.spawn_count.fetch_add(1, Ordering::Relaxed);
303        }
304        acquired
305    }
306
307    /// Returns the total number of prefetch operations started since construction.
308    pub fn spawn_count(&self) -> u64 {
309        self.spawn_count.load(Ordering::Relaxed)
310    }
311
312    /// Clears the in-flight flag, allowing the next read to spawn a prefetch.
313    pub fn clear_in_flight(&self) {
314        self.in_flight.store(false, Ordering::Release);
315    }
316
317    /// Computes the block indices that should be speculatively prefetched.
318    ///
319    /// Given the index of a block currently being accessed, this method returns a
320    /// contiguous sequence of block indices that immediately follow it. The caller
321    /// (typically `File`) is responsible for scheduling the actual I/O operations
322    /// to load these blocks into the cache.
323    ///
324    /// The returned vector contains exactly `window_size` consecutive block indices,
325    /// starting from `current_block + 1` and ending at `current_block + window_size`.
326    /// If prefetching is disabled (`window_size == 0`), returns an empty vector.
327    ///
328    /// # Arguments
329    ///
330    /// * `current_block` - Zero-based index of the block being actively read. This
331    ///   value is used as the anchor point for computing the prefetch range.
332    ///
333    /// # Returns
334    ///
335    /// Returns `Vec<u64>` containing the block indices to prefetch:
336    /// - **Non-empty**: `[current_block + 1, current_block + 2, ..., current_block + window_size]`
337    /// - **Empty**: If `window_size == 0` (prefetching disabled)
338    ///
339    /// # Performance
340    ///
341    /// - **Time complexity**: O(window_size) to allocate and populate the vector
342    /// - **Space complexity**: O(window_size) for the returned vector
343    /// - **Typical cost**: < 1 µs for window_size ≤ 32
344    ///
345    /// # Integer Overflow
346    ///
347    /// This method performs saturating addition to avoid panics if `current_block`
348    /// is near `u64::MAX`. However, in practice, block indices are bounded by the
349    /// snapshot's logical size, which is typically far below `u64::MAX`.
350    ///
351    /// # Examples
352    ///
353    /// ## Standard Prefetch
354    ///
355    /// ```
356    /// use hexz_core::cache::prefetch::Prefetcher;
357    ///
358    /// let prefetcher = Prefetcher::new(5);
359    /// let targets = prefetcher.get_prefetch_targets(42);
360    /// assert_eq!(targets, vec![43, 44, 45, 46, 47]);
361    /// ```
362    ///
363    /// ## Disabled Prefetch
364    ///
365    /// ```
366    /// use hexz_core::cache::prefetch::Prefetcher;
367    ///
368    /// let prefetcher = Prefetcher::new(0);
369    /// let targets = prefetcher.get_prefetch_targets(100);
370    /// assert!(targets.is_empty());
371    /// ```
372    ///
373    /// ## Boundary Case
374    ///
375    /// ```
376    /// use hexz_core::cache::prefetch::Prefetcher;
377    ///
378    /// let prefetcher = Prefetcher::new(1);
379    /// let targets = prefetcher.get_prefetch_targets(999);
380    /// assert_eq!(targets, vec![1000]);
381    /// ```
382    ///
383    /// # Integration Notes
384    ///
385    /// The caller must handle:
386    /// - **Bounds checking**: Ensure prefetch targets do not exceed the stream's
387    ///   logical size (number of blocks in the snapshot)
388    /// - **Cache lookup**: Skip prefetching blocks already present in the cache
389    /// - **I/O scheduling**: Issue async or background reads for the target blocks
390    /// - **Error handling**: Prefetch failures should not impact the foreground read
391    ///
392    /// See `File::read_at_into_uninit` for a reference implementation.
393    pub fn get_prefetch_targets(&self, current_block: u64) -> Vec<u64> {
394        let size = self.window_size.load(Ordering::Relaxed);
395        if size == 0 {
396            return Vec::new();
397        }
398
399        (1..=size as u64)
400            .map(|offset| current_block + offset)
401            .collect()
402    }
403}
404
405#[cfg(test)]
406mod tests {
407    use super::*;
408    use std::sync::Arc;
409    use std::thread;
410
411    #[test]
412    fn test_new_with_zero_window() {
413        let prefetcher = Prefetcher::new(0);
414        let targets = prefetcher.get_prefetch_targets(100);
415        assert!(targets.is_empty());
416    }
417
418    #[test]
419    fn test_new_with_small_window() {
420        let prefetcher = Prefetcher::new(4);
421        let targets = prefetcher.get_prefetch_targets(10);
422        assert_eq!(targets, vec![11, 12, 13, 14]);
423    }
424
425    #[test]
426    fn test_new_with_large_window() {
427        let prefetcher = Prefetcher::new(16);
428        let targets = prefetcher.get_prefetch_targets(100);
429        assert_eq!(targets.len(), 16);
430        assert_eq!(targets[0], 101);
431        assert_eq!(targets[15], 116);
432    }
433
434    #[test]
435    fn test_get_prefetch_targets_sequential() {
436        let prefetcher = Prefetcher::new(5);
437
438        let targets1 = prefetcher.get_prefetch_targets(42);
439        assert_eq!(targets1, vec![43, 44, 45, 46, 47]);
440
441        let targets2 = prefetcher.get_prefetch_targets(43);
442        assert_eq!(targets2, vec![44, 45, 46, 47, 48]);
443    }
444
445    #[test]
446    fn test_get_prefetch_targets_single_block() {
447        let prefetcher = Prefetcher::new(1);
448        let targets = prefetcher.get_prefetch_targets(999);
449        assert_eq!(targets, vec![1000]);
450    }
451
452    #[test]
453    fn test_get_prefetch_targets_from_zero() {
454        let prefetcher = Prefetcher::new(3);
455        let targets = prefetcher.get_prefetch_targets(0);
456        assert_eq!(targets, vec![1, 2, 3]);
457    }
458
459    #[test]
460    fn test_get_prefetch_targets_large_current_block() {
461        let prefetcher = Prefetcher::new(4);
462        let targets = prefetcher.get_prefetch_targets(u64::MAX - 10);
463
464        // Should handle overflow gracefully
465        assert_eq!(targets.len(), 4);
466    }
467
468    #[test]
469    fn test_get_prefetch_targets_consistency() {
470        let prefetcher = Prefetcher::new(8);
471
472        // Calling multiple times with same input should give same result
473        let targets1 = prefetcher.get_prefetch_targets(50);
474        let targets2 = prefetcher.get_prefetch_targets(50);
475
476        assert_eq!(targets1, targets2);
477    }
478
479    #[test]
480    fn test_window_size_32() {
481        let prefetcher = Prefetcher::new(32);
482        let targets = prefetcher.get_prefetch_targets(1000);
483
484        assert_eq!(targets.len(), 32);
485        assert_eq!(targets[0], 1001);
486        assert_eq!(targets[31], 1032);
487    }
488
489    #[test]
490    fn test_debug_format() {
491        let prefetcher = Prefetcher::new(8);
492        let debug_str = format!("{:?}", prefetcher);
493
494        assert!(debug_str.contains("Prefetcher"));
495        assert!(debug_str.contains("window_size"));
496    }
497
498    #[test]
499    fn test_thread_safety() {
500        let prefetcher = Arc::new(Prefetcher::new(4));
501        let mut handles = Vec::new();
502
503        // Spawn multiple threads that call get_prefetch_targets concurrently
504        for i in 0..4 {
505            let prefetcher_clone = Arc::clone(&prefetcher);
506            let handle = thread::spawn(move || {
507                let block_idx = i * 100;
508                let targets = prefetcher_clone.get_prefetch_targets(block_idx);
509
510                assert_eq!(targets.len(), 4);
511                assert_eq!(targets[0], block_idx + 1);
512                assert_eq!(targets[3], block_idx + 4);
513            });
514            handles.push(handle);
515        }
516
517        // Wait for all threads
518        for handle in handles {
519            handle.join().unwrap();
520        }
521    }
522
523    #[test]
524    fn test_multiple_prefetchers() {
525        let pref1 = Prefetcher::new(2);
526        let pref2 = Prefetcher::new(8);
527
528        let targets1 = pref1.get_prefetch_targets(10);
529        let targets2 = pref2.get_prefetch_targets(10);
530
531        assert_eq!(targets1.len(), 2);
532        assert_eq!(targets2.len(), 8);
533    }
534
535    #[test]
536    fn test_prefetch_targets_contiguous() {
537        let prefetcher = Prefetcher::new(10);
538        let targets = prefetcher.get_prefetch_targets(100);
539
540        // Verify targets are contiguous
541        for i in 0..targets.len() - 1 {
542            assert_eq!(targets[i] + 1, targets[i + 1]);
543        }
544    }
545
546    #[test]
547    fn test_very_large_window() {
548        let prefetcher = Prefetcher::new(256);
549        let targets = prefetcher.get_prefetch_targets(0);
550
551        assert_eq!(targets.len(), 256);
552        assert_eq!(targets[0], 1);
553        assert_eq!(targets[255], 256);
554    }
555
556    #[test]
557    fn test_edge_case_max_u32_window() {
558        // This tests that window size can be set to max u32
559        // (though this would be impractical in reality)
560        let prefetcher = Prefetcher::new(u32::MAX);
561
562        // Just verify it doesn't panic
563        let _ = prefetcher.get_prefetch_targets(0);
564    }
565
566    #[test]
567    fn test_window_size_stored_atomically() {
568        let prefetcher = Arc::new(Prefetcher::new(4));
569
570        // Access from multiple threads to verify atomic storage works
571        let p1 = Arc::clone(&prefetcher);
572        let p2 = Arc::clone(&prefetcher);
573
574        let h1 = thread::spawn(move || p1.get_prefetch_targets(0));
575
576        let h2 = thread::spawn(move || p2.get_prefetch_targets(100));
577
578        let t1 = h1.join().unwrap();
579        let t2 = h2.join().unwrap();
580
581        assert_eq!(t1.len(), 4);
582        assert_eq!(t2.len(), 4);
583    }
584
585    #[test]
586    fn test_different_starting_blocks() {
587        let prefetcher = Prefetcher::new(3);
588
589        // Test various starting points
590        for start in [0, 10, 100, 1000, 10000] {
591            let targets = prefetcher.get_prefetch_targets(start);
592            assert_eq!(targets, vec![start + 1, start + 2, start + 3]);
593        }
594    }
595}