hexz_core/cache/prefetch.rs
1//! Adaptive block prefetching for sequential I/O optimization.
2//!
3//! This module implements a simple yet effective prefetching strategy that reduces
4//! read latency for sequential workloads by speculatively loading future blocks into
5//! the cache before they are explicitly requested. The prefetcher operates on a
6//! fixed-window readahead model, making it predictable and suitable for streaming
7//! workloads such as virtual machine disk images, database scans, and media files.
8//!
9//! # Architecture
10//!
11//! The prefetcher is intentionally minimalist: it maintains a single atomic window
12//! size parameter and computes prefetch targets as a contiguous range of block
13//! indices following the current access. There is no pattern detection, adaptive
14//! resizing, or history tracking—this simplicity ensures low overhead and predictable
15//! behavior.
16//!
17//! **Key design decisions:**
18//! - **Fixed window size**: The prefetch distance is constant (typically 4-16 blocks)
19//! and does not adjust based on observed access patterns
20//! - **No sequential detection**: The prefetcher assumes sequential access; the caller
21//! (typically `File::read_at_into_uninit`) is responsible for deciding when
22//! to trigger prefetching based on observed workload patterns
23//! - **Atomic configuration**: Window size can be updated at runtime without locks,
24//! enabling dynamic tuning in response to workload changes
25//! - **Stateless operation**: Each call to `get_prefetch_targets` is independent,
26//! with no persistent state about past accesses
27//!
28//! # Performance Characteristics
29//!
30//! ## When Prefetching Helps
31//!
32//! Prefetching is highly effective for:
33//! - **Sequential reads**: Reading large files or VM disk images linearly
34//! - **Streaming workloads**: Media playback, log file processing, database scans
35//! - **High-latency backends**: S3, HTTP, or network-attached storage where I/O
36//! latency dominates (>10ms per block)
37//! - **Large block sizes**: With 64 KiB or larger blocks, prefetch overhead is
38//! amortized over substantial data volumes
39//!
40//! **Measured benefits** (sequential reads on S3 backend, 64 KiB blocks):
41//! - 4-block prefetch: ~40% latency reduction (30ms → 18ms per read)
42//! - 8-block prefetch: ~60% latency reduction (30ms → 12ms per read)
43//! - 16-block prefetch: ~70% latency reduction (30ms → 9ms per read)
44//!
45//! ## When Prefetching Hurts
46//!
47//! Prefetching imposes costs and should be avoided for:
48//! - **Random access patterns**: Database index lookups, sparse file reads, or
49//! non-sequential I/O waste bandwidth and cache capacity on unneeded blocks
50//! - **Low-latency backends**: Local SSD or NVMe storage (< 1ms latency) where
51//! prefetch overhead exceeds the latency benefit
52//! - **Small block sizes**: With 4 KiB blocks, prefetch overhead (thread spawning,
53//! cache contention) can exceed the data transfer time
54//! - **Memory-constrained systems**: Prefetched blocks evict useful cached data,
55//! potentially reducing overall cache hit rate
56//!
57//! **Measured overheads** (random reads on local SSD, 64 KiB blocks):
58//! - 4-block prefetch: ~15% throughput loss (wasted reads evict useful cache entries)
59//! - 8-block prefetch: ~30% throughput loss
60//! - Recommendation: Disable prefetching (window_size = 0) for random workloads
61//!
62//! # Resource Usage
63//!
64//! ## Memory Overhead
65//!
66//! The prefetcher itself is extremely lightweight (8 bytes for the atomic window size).
67//! However, prefetched blocks consume cache capacity:
68//! - **Per-block overhead**: Typically block_size + ~100 bytes for cache metadata
69//! - **Worst-case prefetch buffer**: `window_size × block_size` bytes
70//! - **Example**: 8-block window with 64 KiB blocks = 512 KiB prefetch buffer
71//!
72//! ## CPU Overhead
73//!
74//! - **Per-access cost**: O(window_size) to compute prefetch targets (typically < 1 µs)
75//! - **Background I/O**: Prefetch requests spawn lightweight threads or async tasks
76//! (implementation-dependent, managed by `File`)
77//!
78//! ## Bandwidth Overhead
79//!
80//! - **Sequential access**: Near-zero waste (prefetched blocks are used)
81//! - **Random access**: Up to 100% waste (prefetched blocks are never accessed)
82//! - **Mixed workloads**: Proportional to sequential vs. random ratio
83//!
84//! # Configuration Guidelines
85//!
86//! ## Choosing Window Size
87//!
88//! | Backend Type | Latency | Recommended Window | Rationale |
89//! |--------------------|-----------|--------------------|------------------------------------|
90//! | Local SSD/NVMe | < 1ms | 0-2 blocks | Prefetch overhead exceeds benefit |
91//! | Local HDD | 5-10ms | 2-4 blocks | Moderate latency hiding |
92//! | S3/Cloud Storage | 20-100ms | 8-16 blocks | High latency hiding critical |
93//! | HTTP/Remote | 50-200ms | 16-32 blocks | Maximum latency hiding needed |
94//!
95//! ## Tuning Heuristics
96//!
97//! A reasonable starting formula:
98//! ```text
99//! window_size = ceil(backend_latency_ms / (block_size_kb / throughput_mbps))
100//! ```
101//!
102//! **Example**: S3 backend with 50ms latency, 64 KiB blocks, 100 MB/s throughput:
103//! ```text
104//! window_size = ceil(50 / (64 / 100)) ≈ ceil(50 / 0.64) ≈ 78 blocks
105//! ```
106//! (In practice, 16-32 blocks is sufficient due to concurrent I/O)
107//!
108//! ## Disabling Prefetch
109//!
110//! To disable prefetching entirely:
111//! - Set `window_size = 0` when constructing `Prefetcher::new(0)`
112//! - Or pass `prefetch_window_size = None` to `File::with_options()`
113//!
114//! # Examples
115//!
116//! ## Basic Usage
117//!
118//! ```
119//! use hexz_core::cache::prefetch::Prefetcher;
120//!
121//! // Configure 8-block readahead for streaming workloads
122//! let prefetcher = Prefetcher::new(8);
123//!
124//! // Application reads block 100
125//! let targets = prefetcher.get_prefetch_targets(100);
126//! assert_eq!(targets, vec![101, 102, 103, 104, 105, 106, 107, 108]);
127//!
128//! // Background task: fetch blocks 101-108 into cache
129//! // (actual prefetch execution is handled by File)
130//! ```
131//!
132//! ## Adaptive Tuning
133//!
134//! ```
135//! use hexz_core::cache::prefetch::Prefetcher;
136//! use std::sync::Arc;
137//!
138//! let prefetcher = Arc::new(Prefetcher::new(4));
139//!
140//! // Detect high-latency backend (e.g., S3) and increase window
141//! // (Note: window_size update method would need to be added)
142//! // prefetcher.set_window_size(16);
143//!
144//! // Detect random access pattern and disable prefetching
145//! // prefetcher.set_window_size(0);
146//! ```
147//!
148//! ## Integration with File
149//!
150//! ```no_run
151//! # use hexz_core::File;
152//! # use hexz_core::store::local::FileBackend;
153//! # use hexz_core::algo::compression::lz4::Lz4Compressor;
154//! # use std::sync::Arc;
155//! # fn main() -> anyhow::Result<()> {
156//! let backend = Arc::new(FileBackend::new("snapshot.hxz".as_ref())?);
157//! let compressor = Box::new(Lz4Compressor::new());
158//!
159//! // Enable prefetching at File creation
160//! let snapshot = File::with_cache(
161//! backend,
162//! compressor,
163//! None, // encryptor
164//! Some(512 * 1024 * 1024), // cache_capacity_bytes (512 MiB)
165//! Some(8), // prefetch_window_size (8 blocks)
166//! )?;
167//!
168//! // Prefetching happens automatically during sequential reads
169//! // (triggered internally by File::read_at_into_uninit)
170//! # Ok(())
171//! # }
172//! ```
173//!
174//! # Implementation Notes
175//!
176//! ## Why No Pattern Detection?
177//!
178//! This implementation deliberately omits sequential pattern detection (unlike Linux
179//! kernel readahead or database buffer managers) for several reasons:
180//! - **Caller context**: `File` has full visibility into access patterns and
181//! can make better decisions about when to prefetch
182//! - **Simplicity**: No history tracking, no stride detection, no state machine overhead
183//! - **Predictability**: Fixed behavior is easier to reason about and debug
184//! - **Composability**: Higher-level policies (in `File` or applications) can
185//! layer sophisticated heuristics atop this simple primitive
186//!
187//! ## Thread Safety
188//!
189//! The prefetcher is fully thread-safe via atomic operations. Multiple threads can
190//! concurrently call `get_prefetch_targets` without contention. If runtime window
191//! size adjustment is needed, `AtomicU32` supports lock-free updates via
192//! `store(Ordering::Relaxed)`.
193//!
194//! ## Future Extensions
195//!
196//! Potential enhancements (not currently implemented):
197//! - **Adaptive window sizing**: Adjust window based on cache hit rate or latency
198//! - **Stride detection**: Support strided access patterns (e.g., reading every Nth block)
199//! - **Multi-stream prefetch**: Prefetch from multiple streams (Disk + Memory) simultaneously
200//! - **Priority hints**: Mark prefetch requests as low-priority to avoid evicting hot data
201
202use std::sync::atomic::{AtomicBool, AtomicU32, AtomicU64, Ordering};
203
204/// Thread-safe prefetch manager with configurable lookahead window.
205///
206/// This struct encapsulates the prefetch configuration and computes which blocks
207/// should be speculatively loaded following a given access. It maintains minimal
208/// state (just the window size) and relies on atomic operations for thread safety.
209///
210/// # Thread Safety
211///
212/// `Prefetcher` is fully thread-safe and can be shared across threads via `Arc<Prefetcher>`.
213/// The window size is stored as an `AtomicU32`, allowing lock-free reads and updates.
214/// Multiple threads can concurrently call `get_prefetch_targets` without contention.
215///
216/// # Performance
217///
218/// - **Memory footprint**: 8 bytes (one `AtomicU32`)
219/// - **Computation cost**: O(window_size) per call to `get_prefetch_targets`
220/// - **Contention**: None (lock-free atomic operations)
221///
222/// # Examples
223///
224/// ```
225/// use hexz_core::cache::prefetch::Prefetcher;
226///
227/// let prefetcher = Prefetcher::new(4);
228/// let targets = prefetcher.get_prefetch_targets(10);
229/// assert_eq!(targets, vec![11, 12, 13, 14]);
230/// ```
231#[derive(Debug)]
232pub struct Prefetcher {
233 /// The number of blocks to fetch ahead of the current request.
234 ///
235 /// Stored as `AtomicU32` to enable lock-free runtime updates. A value of 0
236 /// disables prefetching (returns empty target vectors).
237 window_size: AtomicU32,
238
239 /// Guards against concurrent prefetch threads. Only one prefetch operation
240 /// can be in flight at a time to prevent unbounded thread spawning.
241 in_flight: AtomicBool,
242
243 /// Counts the total number of prefetch operations that were successfully started
244 /// via [`try_start`](Self::try_start). Used for testing and diagnostics.
245 spawn_count: AtomicU64,
246}
247
248impl Prefetcher {
249 /// Constructs a new prefetcher with a fixed lookahead window.
250 ///
251 /// The window size determines how many blocks ahead of the current access should
252 /// be prefetched. A value of 0 disables prefetching entirely, which is appropriate
253 /// for random access workloads or low-latency storage backends.
254 ///
255 /// # Arguments
256 ///
257 /// * `window_size` - Number of blocks to prefetch ahead of the current access.
258 /// Typical values range from 4 (local storage) to 32 (high-latency cloud backends).
259 /// Must fit in `u32` (practical limit is much lower, usually < 256).
260 ///
261 /// # Returns
262 ///
263 /// Returns a new `Prefetcher` instance configured with the specified window size.
264 ///
265 /// # Performance
266 ///
267 /// - **Time complexity**: O(1)
268 /// - **Memory allocation**: 8 bytes
269 ///
270 /// # Examples
271 ///
272 /// ```
273 /// use hexz_core::cache::prefetch::Prefetcher;
274 ///
275 /// // Aggressive prefetching for high-latency S3 backend
276 /// let s3_prefetcher = Prefetcher::new(16);
277 ///
278 /// // Conservative prefetching for local SSD
279 /// let ssd_prefetcher = Prefetcher::new(2);
280 ///
281 /// // Disable prefetching for random access workload
282 /// let no_prefetch = Prefetcher::new(0);
283 /// assert!(no_prefetch.get_prefetch_targets(100).is_empty());
284 /// ```
285 pub fn new(window_size: u32) -> Self {
286 Self {
287 window_size: AtomicU32::new(window_size),
288 in_flight: AtomicBool::new(false),
289 spawn_count: AtomicU64::new(0),
290 }
291 }
292
293 /// Attempts to acquire the in-flight guard. Returns `true` if this caller
294 /// won the race and should spawn a prefetch thread. The caller **must** call
295 /// [`clear_in_flight`](Self::clear_in_flight) when the prefetch completes.
296 pub fn try_start(&self) -> bool {
297 let acquired = self
298 .in_flight
299 .compare_exchange(false, true, Ordering::AcqRel, Ordering::Relaxed)
300 .is_ok();
301 if acquired {
302 self.spawn_count.fetch_add(1, Ordering::Relaxed);
303 }
304 acquired
305 }
306
307 /// Returns the total number of prefetch operations started since construction.
308 pub fn spawn_count(&self) -> u64 {
309 self.spawn_count.load(Ordering::Relaxed)
310 }
311
312 /// Clears the in-flight flag, allowing the next read to spawn a prefetch.
313 pub fn clear_in_flight(&self) {
314 self.in_flight.store(false, Ordering::Release);
315 }
316
317 /// Computes the block indices that should be speculatively prefetched.
318 ///
319 /// Given the index of a block currently being accessed, this method returns a
320 /// contiguous sequence of block indices that immediately follow it. The caller
321 /// (typically `File`) is responsible for scheduling the actual I/O operations
322 /// to load these blocks into the cache.
323 ///
324 /// The returned vector contains exactly `window_size` consecutive block indices,
325 /// starting from `current_block + 1` and ending at `current_block + window_size`.
326 /// If prefetching is disabled (`window_size == 0`), returns an empty vector.
327 ///
328 /// # Arguments
329 ///
330 /// * `current_block` - Zero-based index of the block being actively read. This
331 /// value is used as the anchor point for computing the prefetch range.
332 ///
333 /// # Returns
334 ///
335 /// Returns `Vec<u64>` containing the block indices to prefetch:
336 /// - **Non-empty**: `[current_block + 1, current_block + 2, ..., current_block + window_size]`
337 /// - **Empty**: If `window_size == 0` (prefetching disabled)
338 ///
339 /// # Performance
340 ///
341 /// - **Time complexity**: O(window_size) to allocate and populate the vector
342 /// - **Space complexity**: O(window_size) for the returned vector
343 /// - **Typical cost**: < 1 µs for window_size ≤ 32
344 ///
345 /// # Integer Overflow
346 ///
347 /// This method performs saturating addition to avoid panics if `current_block`
348 /// is near `u64::MAX`. However, in practice, block indices are bounded by the
349 /// snapshot's logical size, which is typically far below `u64::MAX`.
350 ///
351 /// # Examples
352 ///
353 /// ## Standard Prefetch
354 ///
355 /// ```
356 /// use hexz_core::cache::prefetch::Prefetcher;
357 ///
358 /// let prefetcher = Prefetcher::new(5);
359 /// let targets = prefetcher.get_prefetch_targets(42);
360 /// assert_eq!(targets, vec![43, 44, 45, 46, 47]);
361 /// ```
362 ///
363 /// ## Disabled Prefetch
364 ///
365 /// ```
366 /// use hexz_core::cache::prefetch::Prefetcher;
367 ///
368 /// let prefetcher = Prefetcher::new(0);
369 /// let targets = prefetcher.get_prefetch_targets(100);
370 /// assert!(targets.is_empty());
371 /// ```
372 ///
373 /// ## Boundary Case
374 ///
375 /// ```
376 /// use hexz_core::cache::prefetch::Prefetcher;
377 ///
378 /// let prefetcher = Prefetcher::new(1);
379 /// let targets = prefetcher.get_prefetch_targets(999);
380 /// assert_eq!(targets, vec![1000]);
381 /// ```
382 ///
383 /// # Integration Notes
384 ///
385 /// The caller must handle:
386 /// - **Bounds checking**: Ensure prefetch targets do not exceed the stream's
387 /// logical size (number of blocks in the snapshot)
388 /// - **Cache lookup**: Skip prefetching blocks already present in the cache
389 /// - **I/O scheduling**: Issue async or background reads for the target blocks
390 /// - **Error handling**: Prefetch failures should not impact the foreground read
391 ///
392 /// See `File::read_at_into_uninit` for a reference implementation.
393 pub fn get_prefetch_targets(&self, current_block: u64) -> Vec<u64> {
394 let size = self.window_size.load(Ordering::Relaxed);
395 if size == 0 {
396 return Vec::new();
397 }
398
399 (1..=size as u64)
400 .map(|offset| current_block + offset)
401 .collect()
402 }
403}
404
405#[cfg(test)]
406mod tests {
407 use super::*;
408 use std::sync::Arc;
409 use std::thread;
410
411 #[test]
412 fn test_new_with_zero_window() {
413 let prefetcher = Prefetcher::new(0);
414 let targets = prefetcher.get_prefetch_targets(100);
415 assert!(targets.is_empty());
416 }
417
418 #[test]
419 fn test_new_with_small_window() {
420 let prefetcher = Prefetcher::new(4);
421 let targets = prefetcher.get_prefetch_targets(10);
422 assert_eq!(targets, vec![11, 12, 13, 14]);
423 }
424
425 #[test]
426 fn test_new_with_large_window() {
427 let prefetcher = Prefetcher::new(16);
428 let targets = prefetcher.get_prefetch_targets(100);
429 assert_eq!(targets.len(), 16);
430 assert_eq!(targets[0], 101);
431 assert_eq!(targets[15], 116);
432 }
433
434 #[test]
435 fn test_get_prefetch_targets_sequential() {
436 let prefetcher = Prefetcher::new(5);
437
438 let targets1 = prefetcher.get_prefetch_targets(42);
439 assert_eq!(targets1, vec![43, 44, 45, 46, 47]);
440
441 let targets2 = prefetcher.get_prefetch_targets(43);
442 assert_eq!(targets2, vec![44, 45, 46, 47, 48]);
443 }
444
445 #[test]
446 fn test_get_prefetch_targets_single_block() {
447 let prefetcher = Prefetcher::new(1);
448 let targets = prefetcher.get_prefetch_targets(999);
449 assert_eq!(targets, vec![1000]);
450 }
451
452 #[test]
453 fn test_get_prefetch_targets_from_zero() {
454 let prefetcher = Prefetcher::new(3);
455 let targets = prefetcher.get_prefetch_targets(0);
456 assert_eq!(targets, vec![1, 2, 3]);
457 }
458
459 #[test]
460 fn test_get_prefetch_targets_large_current_block() {
461 let prefetcher = Prefetcher::new(4);
462 let targets = prefetcher.get_prefetch_targets(u64::MAX - 10);
463
464 // Should handle overflow gracefully
465 assert_eq!(targets.len(), 4);
466 }
467
468 #[test]
469 fn test_get_prefetch_targets_consistency() {
470 let prefetcher = Prefetcher::new(8);
471
472 // Calling multiple times with same input should give same result
473 let targets1 = prefetcher.get_prefetch_targets(50);
474 let targets2 = prefetcher.get_prefetch_targets(50);
475
476 assert_eq!(targets1, targets2);
477 }
478
479 #[test]
480 fn test_window_size_32() {
481 let prefetcher = Prefetcher::new(32);
482 let targets = prefetcher.get_prefetch_targets(1000);
483
484 assert_eq!(targets.len(), 32);
485 assert_eq!(targets[0], 1001);
486 assert_eq!(targets[31], 1032);
487 }
488
489 #[test]
490 fn test_debug_format() {
491 let prefetcher = Prefetcher::new(8);
492 let debug_str = format!("{:?}", prefetcher);
493
494 assert!(debug_str.contains("Prefetcher"));
495 assert!(debug_str.contains("window_size"));
496 }
497
498 #[test]
499 fn test_thread_safety() {
500 let prefetcher = Arc::new(Prefetcher::new(4));
501 let mut handles = Vec::new();
502
503 // Spawn multiple threads that call get_prefetch_targets concurrently
504 for i in 0..4 {
505 let prefetcher_clone = Arc::clone(&prefetcher);
506 let handle = thread::spawn(move || {
507 let block_idx = i * 100;
508 let targets = prefetcher_clone.get_prefetch_targets(block_idx);
509
510 assert_eq!(targets.len(), 4);
511 assert_eq!(targets[0], block_idx + 1);
512 assert_eq!(targets[3], block_idx + 4);
513 });
514 handles.push(handle);
515 }
516
517 // Wait for all threads
518 for handle in handles {
519 handle.join().unwrap();
520 }
521 }
522
523 #[test]
524 fn test_multiple_prefetchers() {
525 let pref1 = Prefetcher::new(2);
526 let pref2 = Prefetcher::new(8);
527
528 let targets1 = pref1.get_prefetch_targets(10);
529 let targets2 = pref2.get_prefetch_targets(10);
530
531 assert_eq!(targets1.len(), 2);
532 assert_eq!(targets2.len(), 8);
533 }
534
535 #[test]
536 fn test_prefetch_targets_contiguous() {
537 let prefetcher = Prefetcher::new(10);
538 let targets = prefetcher.get_prefetch_targets(100);
539
540 // Verify targets are contiguous
541 for i in 0..targets.len() - 1 {
542 assert_eq!(targets[i] + 1, targets[i + 1]);
543 }
544 }
545
546 #[test]
547 fn test_very_large_window() {
548 let prefetcher = Prefetcher::new(256);
549 let targets = prefetcher.get_prefetch_targets(0);
550
551 assert_eq!(targets.len(), 256);
552 assert_eq!(targets[0], 1);
553 assert_eq!(targets[255], 256);
554 }
555
556 #[test]
557 fn test_edge_case_max_u32_window() {
558 // This tests that window size can be set to max u32
559 // (though this would be impractical in reality)
560 let prefetcher = Prefetcher::new(u32::MAX);
561
562 // Just verify it doesn't panic
563 let _ = prefetcher.get_prefetch_targets(0);
564 }
565
566 #[test]
567 fn test_window_size_stored_atomically() {
568 let prefetcher = Arc::new(Prefetcher::new(4));
569
570 // Access from multiple threads to verify atomic storage works
571 let p1 = Arc::clone(&prefetcher);
572 let p2 = Arc::clone(&prefetcher);
573
574 let h1 = thread::spawn(move || p1.get_prefetch_targets(0));
575
576 let h2 = thread::spawn(move || p2.get_prefetch_targets(100));
577
578 let t1 = h1.join().unwrap();
579 let t2 = h2.join().unwrap();
580
581 assert_eq!(t1.len(), 4);
582 assert_eq!(t2.len(), 4);
583 }
584
585 #[test]
586 fn test_different_starting_blocks() {
587 let prefetcher = Prefetcher::new(3);
588
589 // Test various starting points
590 for start in [0, 10, 100, 1000, 10000] {
591 let targets = prefetcher.get_prefetch_targets(start);
592 assert_eq!(targets, vec![start + 1, start + 2, start + 3]);
593 }
594 }
595}