pub struct FileFragment { /* private fields */ }Expand description
A Fragment of a Lance Dataset.
The interface is modeled after pyarrow.dataset.Fragment.
Implementations§
Source§impl FileFragment
impl FileFragment
Sourcepub async fn create(
dataset_uri: &str,
id: usize,
source: impl StreamingWriteSource,
params: Option<WriteParams>,
) -> Result<Fragment>
pub async fn create( dataset_uri: &str, id: usize, source: impl StreamingWriteSource, params: Option<WriteParams>, ) -> Result<Fragment>
Create a new FileFragment from a StreamingWriteSource.
This method can be used before a Dataset is created. For example,
Fragments can be created distributed first, before a central machine to
commit the dataset with these fragments.
Sourcepub async fn create_fragments(
dataset_uri: &str,
source: impl StreamingWriteSource,
params: Option<WriteParams>,
) -> Result<Vec<Fragment>>
pub async fn create_fragments( dataset_uri: &str, source: impl StreamingWriteSource, params: Option<WriteParams>, ) -> Result<Vec<Fragment>>
Create a list of FileFragment from a StreamingWriteSource.
pub async fn create_from_file( filename: &str, dataset: &Dataset, fragment_id: usize, physical_rows: Option<usize>, ) -> Result<Fragment>
pub fn dataset(&self) -> &Dataset
pub fn schema(&self) -> &Schema
Sourcepub fn id(&self) -> usize
pub fn id(&self) -> usize
The id of this FileFragment.
Sourcepub fn num_data_files(&self) -> usize
pub fn num_data_files(&self) -> usize
The number of data files in this fragment.
Sourcepub fn data_file_for_field(&self, field_id: u32) -> Option<&DataFile>
pub fn data_file_for_field(&self, field_id: u32) -> Option<&DataFile>
Gets the data file for a given field
Sourcepub async fn open(
&self,
projection: &Schema,
read_config: FragReadConfig,
) -> Result<FragmentReader>
pub async fn open( &self, projection: &Schema, read_config: FragReadConfig, ) -> Result<FragmentReader>
Open a FileFragment with a given default projection.
All read operations (other than read_projected) will use the supplied
default projection. For read_projected, the projection must be a subset
of the default projection.
Parameters
projection: The projection schema.read_config: Controls what columns are included in the output.scan_scheduler: The scheduler to use for reading data files. If not supplied and the data is v2 data then a new scheduler will be created
projection may be an empty schema only if with_row_id is true. In that
case, the reader will only be generating row ids.
Sourcepub async fn count_rows(&self, filter: Option<String>) -> Result<usize>
pub async fn count_rows(&self, filter: Option<String>) -> Result<usize>
Count the rows in this fragment.
Sourcepub async fn count_deletions(&self) -> Result<usize>
pub async fn count_deletions(&self) -> Result<usize>
Get the number of rows that have been deleted in this fragment.
Sourcepub fn fast_physical_rows(&self) -> Result<usize>
pub fn fast_physical_rows(&self) -> Result<usize>
Get the number of physical rows in the fragment synchronously
Fails if the fragment does not have the physical row count in the metadata. This method should only be called in new workflows which are not run on old versions of Lance.
Sourcepub fn fast_num_deletions(&self) -> Result<usize>
pub fn fast_num_deletions(&self) -> Result<usize>
Get the number of deleted rows in the fragment synchronously
Fails if the fragment does not have deletion count in the metadata. This method should only be called in new workflows which are not run on old versions of Lance.
Sourcepub fn fast_logical_rows(&self) -> Result<usize>
pub fn fast_logical_rows(&self) -> Result<usize>
Get the number of logical rows (physical rows - deleted rows) in the fragment synchronously
Fails if the fragment does not have the physical row count or deletion count in the metadata. This method should only be called in new workflows which are not run on old versions of Lance.
Sourcepub async fn physical_rows(&self) -> Result<usize>
pub async fn physical_rows(&self) -> Result<usize>
Get the number of physical rows in the fragment. This includes deleted rows.
If there are no deleted rows, this is equal to the number of rows in the fragment.
Sourcepub async fn validate(&self) -> Result<()>
pub async fn validate(&self) -> Result<()>
Validate the fragment
Verifies:
- All field ids in the fragment are distinct
- Within each data file, field ids are in increasing order
- All data files exist and have the same length
- Field ids are distinct between data files.
- Deletion file exists and has rowids in the correct range
Fragment.physical_rowsmatches length of fileDeletionFile.num_deleted_rowsmatches length of deletion vector
Sourcepub async fn open_session(
&self,
projection: &Schema,
with_row_address: bool,
) -> Result<FragmentSession>
pub async fn open_session( &self, projection: &Schema, with_row_address: bool, ) -> Result<FragmentSession>
Open a FragmentSession, which manages a short-lived session of FileFragment.
This API works well for users making repeated requests over the same projected schema.
Sourcepub async fn take(
&self,
indices: &[u32],
projection: &Schema,
) -> Result<RecordBatch>
pub async fn take( &self, indices: &[u32], projection: &Schema, ) -> Result<RecordBatch>
Take rows from this fragment based on the offset in the file.
This will always return the same number of rows as the input indices. If indices are out-of-bounds, this will return an error.
Sourcepub async fn get_deletion_vector(&self) -> Result<Option<Arc<DeletionVector>>>
pub async fn get_deletion_vector(&self) -> Result<Option<Arc<DeletionVector>>>
Get the deletion vector for this fragment, using the cache if available.
Sourcepub fn scan(&self) -> Scanner
pub fn scan(&self) -> Scanner
Scan this FileFragment.
See Dataset::scan.
pub async fn merge_columns( &mut self, stream: impl RecordBatchReader + Send + 'static, left_on: &str, right_on: &str, max_field_id: i32, ) -> Result<(Fragment, Schema)>
pub async fn update_columns( &mut self, right_stream: impl RecordBatchReader + Send + 'static, left_on: &str, right_on: &str, ) -> Result<(Fragment, Vec<u32>)>
Sourcepub async fn add_columns(
&self,
transforms: NewColumnTransform,
read_columns: Option<Vec<String>>,
batch_size: Option<u32>,
) -> Result<(Fragment, Schema)>
pub async fn add_columns( &self, transforms: NewColumnTransform, read_columns: Option<Vec<String>>, batch_size: Option<u32>, ) -> Result<(Fragment, Schema)>
Append new columns to the fragment
This is the fragment-level version of Dataset::add_columns.
Sourcepub async fn delete(self, predicate: &str) -> Result<Option<Self>>
pub async fn delete(self, predicate: &str) -> Result<Option<Self>>
Delete rows from the fragment.
If all rows are deleted, returns Ok(None). Otherwise, returns a new
fragment with the updated deletion vector. This must be persisted to
the manifest.
pub async fn extend_deletions( self, new_deletions: impl IntoIterator<Item = u32>, ) -> Result<Option<Self>>
Trait Implementations§
Source§impl Clone for FileFragment
impl Clone for FileFragment
Source§fn clone(&self) -> FileFragment
fn clone(&self) -> FileFragment
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for FileFragment
impl Debug for FileFragment
Source§impl From<FileFragment> for Fragment
impl From<FileFragment> for Fragment
Source§fn from(fragment: FileFragment) -> Self
fn from(fragment: FileFragment) -> Self
Auto Trait Implementations§
impl Freeze for FileFragment
impl !RefUnwindSafe for FileFragment
impl Send for FileFragment
impl Sync for FileFragment
impl Unpin for FileFragment
impl UnsafeUnpin for FileFragment
impl !UnwindSafe for FileFragment
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Downcast for Twhere
T: Any,
impl<T> Downcast for Twhere
T: Any,
Source§fn into_any(self: Box<T>) -> Box<dyn Any>
fn into_any(self: Box<T>) -> Box<dyn Any>
Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>, which can then be
downcast into Box<dyn ConcreteType> where ConcreteType implements Trait.Source§fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>
Rc<Trait> (where Trait: Downcast) to Rc<Any>, which can then be further
downcast into Rc<ConcreteType> where ConcreteType implements Trait.Source§fn as_any(&self) -> &(dyn Any + 'static)
fn as_any(&self) -> &(dyn Any + 'static)
&Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &Any’s vtable from &Trait’s.Source§fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)
&mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot
generate &mut Any’s vtable from &mut Trait’s.Source§impl<T> DowncastSend for T
impl<T> DowncastSend for T
Source§impl<T> DowncastSync for T
impl<T> DowncastSync for T
Source§impl<T> FmtForward for T
impl<T> FmtForward for T
Source§fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
self to use its Binary implementation when Debug-formatted.Source§fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
self to use its Display implementation when
Debug-formatted.Source§fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
self to use its LowerExp implementation when
Debug-formatted.Source§fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
self to use its LowerHex implementation when
Debug-formatted.Source§fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
self to use its Octal implementation when Debug-formatted.Source§fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
self to use its Pointer implementation when
Debug-formatted.Source§fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
self to use its UpperExp implementation when
Debug-formatted.Source§fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
self to use its UpperHex implementation when
Debug-formatted.Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pipe for Twhere
T: ?Sized,
impl<T> Pipe for Twhere
T: ?Sized,
Source§fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
Source§fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
Source§fn pipe_borrow_mut<'a, B, R>(
&'a mut self,
func: impl FnOnce(&'a mut B) -> R,
) -> R
fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
Source§fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
self, then passes self.as_ref() into the pipe function.Source§fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
self, then passes self.as_mut() into the pipe
function.Source§fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
self, then passes self.deref() into the pipe function.Source§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<T> Tap for T
impl<T> Tap for T
Source§fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
Borrow<B> of a value. Read moreSource§fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
BorrowMut<B> of a value. Read moreSource§fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
AsRef<R> view of a value. Read moreSource§fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
AsMut<R> view of a value. Read moreSource§fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
.tap() only in debug builds, and is erased in release builds.Source§fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
.tap_mut() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
.tap_borrow() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
.tap_borrow_mut() only in debug builds, and is erased in release
builds.Source§fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
.tap_ref() only in debug builds, and is erased in release
builds.Source§fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
.tap_ref_mut() only in debug builds, and is erased in release
builds.Source§fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
.tap_deref() only in debug builds, and is erased in release
builds.