SparseIO is a Rust library for sparse, out-of-order materialization of large byte objects.
Instead of eagerly copying an entire object from source to destination, SparseIO allows you to fetch only the chunks you ask for. It tracks what is already present for efficient caching, and deduplicates concurrent reads for the same chunk.
Core Premise
Certain large data objects, such as multimedia files, system logs, columnar storage files used in AI and ML workloads, and archival records, are often accessed non-sequentially. In these scenarios, applications typically retrieve only specific byte ranges rather than reading the entire object. Loading all bytes upfront results in unnecessary I/O, increased latency, and inefficient bandwidth utilization. Selective or partial reads improve performance by reducing data transfer, accelerating processing, and optimizing resource consumption.
SparseIO models this as:
- A
Readerthat can fetch bytes from an upstream source at an offset. - A
Writerthat stores data extents in a local/closer destination representing the object sparsely. - A coordinator (
SparseIO) that:- checks existing coverage for existing cache,
- deduplicates in-flight fetches so concurrent callers do not duplicate work,
- manages coverage metadata and cache.
What You Get
- On-demand chunk materialization.
- Coverage-aware reads from an extent store.
- In-flight deduplication for concurrent requests.
- Pluggable backends via
ReaderandWritertraits. - Optional source implementations in
sources(feature-gated).
Current Feature Flags
file: file-backedReader/Writerimplementations.http: reqwest-backed HTTP range-basedReaderimplementation.
Quickstart
Run the file-to-file sparse example:
The example intentionally materializes randomized chunk offsets first, then verifies:
- full fill => destination matches source byte-for-byte,
- partial fill => written chunks match source and unwritten regions remain zeroed.
See: examples/file_to_file.rs and examples/file_to_file.md.
Minimal API Shape
use Arc;
use Builder;
async