pub struct MmapDataset { /* private fields */ }Expand description
A memory-mapped dataset backed by a Parquet file.
This dataset type memory-maps the underlying file, allowing efficient access to large datasets without loading everything into memory. The OS handles paging data in and out as needed.
§Performance Characteristics
- Memory efficient: Only pages accessed are loaded into RAM
- Fast startup: No need to read entire file upfront
- Random access: Efficient access to any row
- OS-managed caching: Leverages OS page cache
§Limitations
- File must remain accessible during dataset lifetime
- Not available on WASM targets
- Requires file to be on a seekable filesystem
Implementations§
Source§impl MmapDataset
impl MmapDataset
Sourcepub fn open(path: impl AsRef<Path>) -> Result<Self>
pub fn open(path: impl AsRef<Path>) -> Result<Self>
Opens a Parquet file as a memory-mapped dataset.
§Arguments
path- Path to the Parquet file
§Errors
Returns an error if:
- The file cannot be opened
- The file is not valid Parquet
- Memory mapping fails
- The file is empty
§Example
use alimentar::MmapDataset;
let dataset = MmapDataset::open("data.parquet").unwrap();Sourcepub fn to_arrow_dataset(&self) -> Result<ArrowDataset>
pub fn to_arrow_dataset(&self) -> Result<ArrowDataset>
Converts this memory-mapped dataset to an in-memory ArrowDataset.
This is useful when you need to modify the data or when you want to ensure all data is in memory for faster access.
§Errors
Returns an error if the conversion fails.
Trait Implementations§
Source§impl Dataset for MmapDataset
impl Dataset for MmapDataset
Source§fn get(&self, index: usize) -> Option<RecordBatch>
fn get(&self, index: usize) -> Option<RecordBatch>
Returns a single row as a RecordBatch with one row. Read more
Source§fn iter(&self) -> Box<dyn Iterator<Item = RecordBatch> + Send + '_>
fn iter(&self) -> Box<dyn Iterator<Item = RecordBatch> + Send + '_>
Returns an iterator over all RecordBatches in the dataset.
Source§fn num_batches(&self) -> usize
fn num_batches(&self) -> usize
Returns the number of batches in the dataset.
Auto Trait Implementations§
impl Freeze for MmapDataset
impl !RefUnwindSafe for MmapDataset
impl Send for MmapDataset
impl Sync for MmapDataset
impl Unpin for MmapDataset
impl UnsafeUnpin for MmapDataset
impl !UnwindSafe for MmapDataset
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreCreates a shared type from an unshared type.