Skip to main content

MmapDataset

Struct MmapDataset 

Source
pub struct MmapDataset { /* private fields */ }
Expand description

A memory-mapped dataset backed by a Parquet file.

This dataset type memory-maps the underlying file, allowing efficient access to large datasets without loading everything into memory. The OS handles paging data in and out as needed.

§Performance Characteristics

  • Memory efficient: Only pages accessed are loaded into RAM
  • Fast startup: No need to read entire file upfront
  • Random access: Efficient access to any row
  • OS-managed caching: Leverages OS page cache

§Limitations

  • File must remain accessible during dataset lifetime
  • Not available on WASM targets
  • Requires file to be on a seekable filesystem

Implementations§

Source§

impl MmapDataset

Source

pub fn open(path: impl AsRef<Path>) -> Result<Self>

Opens a Parquet file as a memory-mapped dataset.

§Arguments
  • path - Path to the Parquet file
§Errors

Returns an error if:

  • The file cannot be opened
  • The file is not valid Parquet
  • Memory mapping fails
  • The file is empty
§Example
use alimentar::MmapDataset;

let dataset = MmapDataset::open("data.parquet").unwrap();
Source

pub fn open_with_batch_size( path: impl AsRef<Path>, batch_size: usize, ) -> Result<Self>

Opens a Parquet file with a specified batch size.

§Arguments
  • path - Path to the Parquet file
  • batch_size - Number of rows per batch
§Errors

Returns an error if opening or parsing fails.

Source

pub fn path(&self) -> &Path

Returns the path to the underlying file.

Source

pub fn mmap_size(&self) -> usize

Returns the size of the memory-mapped region in bytes.

Source

pub fn to_arrow_dataset(&self) -> Result<ArrowDataset>

Converts this memory-mapped dataset to an in-memory ArrowDataset.

This is useful when you need to modify the data or when you want to ensure all data is in memory for faster access.

§Errors

Returns an error if the conversion fails.

Source§

impl MmapDataset

Source

pub fn try_clone(&self) -> Result<Self>

Try to clone this dataset by re-opening the underlying file.

This can fail if the file has been deleted or is no longer accessible.

Trait Implementations§

Source§

impl Dataset for MmapDataset

Source§

fn len(&self) -> usize

Returns the total number of rows in the dataset.
Source§

fn get(&self, index: usize) -> Option<RecordBatch>

Returns a single row as a RecordBatch with one row. Read more
Source§

fn schema(&self) -> SchemaRef

Returns the schema of the dataset.
Source§

fn iter(&self) -> Box<dyn Iterator<Item = RecordBatch> + Send + '_>

Returns an iterator over all RecordBatches in the dataset.
Source§

fn num_batches(&self) -> usize

Returns the number of batches in the dataset.
Source§

fn get_batch(&self, index: usize) -> Option<&RecordBatch>

Returns a specific batch by index.
Source§

fn is_empty(&self) -> bool

Returns true if the dataset contains no rows.
Source§

impl Debug for MmapDataset

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<Unshared, Shared> IntoShared<Shared> for Unshared
where Shared: FromUnshared<Unshared>,

Source§

fn into_shared(self) -> Shared

Creates a shared type from an unshared type.
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more