Skip to main content

StreamingDeletionVectorWriter

Struct StreamingDeletionVectorWriter 

Source
pub struct StreamingDeletionVectorWriter<'a, W: Write> { /* private fields */ }
Expand description

A streaming writer for deletion vectors.

This writer allows for writing multiple deletion vectors to a single file in a streaming fashion, which is memory-efficient for distributed workloads where deletion vectors are generated on executors.

§Format

The writer produces deletion vector files in the Delta Lake format:

  • The first byte of the file is a version byte (currently 1)
  • Each DV is prefixed with a 4-byte size (big-endian) of the serialized data
  • Followed by a 4-byte magic number (0x64485871, little-endian)
  • Followed by the serialized 64-bit Roaring Bitmap
  • Followed by a 4-byte CRC32 checksum (big-endian) of the serialized data

§Examples

use delta_kernel::actions::deletion_vector_writer::{StreamingDeletionVectorWriter, KernelDeletionVector};

let mut buffer = Vec::new();
let mut writer = StreamingDeletionVectorWriter::new(&mut buffer);

let mut dv = KernelDeletionVector::new();
dv.add_deleted_row_indexes([1, 5, 10]);

let descriptor = writer.write_deletion_vector(dv)?;
writer.finalize()?;

Implementations§

Source§

impl<'a, W: Write> StreamingDeletionVectorWriter<'a, W>

Source

pub fn new(writer: &'a mut W) -> Self

Create a new streaming deletion vector writer.

§Arguments
  • writer - A mutable reference to any type implementing std::io::Write.
§Examples
let mut buffer = Vec::new();
let writer = StreamingDeletionVectorWriter::new(&mut buffer);
Source

pub fn write_deletion_vector( &mut self, deletion_vector: impl DeletionVector, ) -> DeltaResult<DeletionVectorWriteResult>

Write a deletion vector to the underlying writer.

This method can be called multiple times to write multiple deletion vectors to the same writer. The caller is responsible for keeping track of which deletion vector corresponds to which data file.

§Arguments
  • deletion_vector - The deletion vector to write
§Returns

A DeletionVectorWriteResult containing the offset, size, and cardinality of the written deletion vector.

§Errors

Returns an error if:

  • The writer fails to write data
  • The deletion vector cannot be serialized
  • The offset or size would overflow an i32
§Examples
let mut dv = KernelDeletionVector::new();
dv.add_deleted_row_indexes([1, 5, 10]);

let descriptor = writer.write_deletion_vector(dv)?;
println!("Written DV at offset {} with size {}", descriptor.offset, descriptor.size_in_bytes);
Source

pub fn finalize(self) -> DeltaResult<()>

Finalize all writes and flush the underlying writer.

This method should be called after all deletion vectors have been written. After calling this method, the writer should not be used anymore.

§Errors

Returns an error if flushing the writer fails.

§Examples
writer.write_deletion_vector(dv1)?;
writer.write_deletion_vector(dv2)?;
writer.finalize()?;

Auto Trait Implementations§

§

impl<'a, W> Freeze for StreamingDeletionVectorWriter<'a, W>

§

impl<'a, W> RefUnwindSafe for StreamingDeletionVectorWriter<'a, W>
where W: RefUnwindSafe,

§

impl<'a, W> Send for StreamingDeletionVectorWriter<'a, W>
where W: Send,

§

impl<'a, W> Sync for StreamingDeletionVectorWriter<'a, W>
where W: Sync,

§

impl<'a, W> Unpin for StreamingDeletionVectorWriter<'a, W>

§

impl<'a, W> UnsafeUnpin for StreamingDeletionVectorWriter<'a, W>

§

impl<'a, W> !UnwindSafe for StreamingDeletionVectorWriter<'a, W>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> AsAny for T
where T: Any + Send + Sync,

Source§

fn any_ref(&self) -> &(dyn Any + Sync + Send + 'static)

Obtains a dyn Any reference to the object: Read more
Source§

fn as_any(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Obtains an Arc<dyn Any> reference to the object: Read more
Source§

fn into_any(self: Box<T>) -> Box<dyn Any + Sync + Send>

Converts the object to Box<dyn Any>: Read more
Source§

fn type_name(&self) -> &'static str

Convenient wrapper for std::any::type_name, since Any does not provide it and Any::type_id is useless as a debugging aid (its Debug is just a mess of hex digits).
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<KernelType, ArrowType> TryIntoArrow<ArrowType> for KernelType
where ArrowType: TryFromKernel<KernelType>,

Source§

fn try_into_arrow(self) -> Result<ArrowType, ArrowError>

Available on (crate features default-engine-native-tls or default-engine-rustls or arrow-conversion) and crate feature arrow-conversion only.
Source§

impl<KernelType, ArrowType> TryIntoKernel<KernelType> for ArrowType
where KernelType: TryFromArrow<ArrowType>,

Source§

fn try_into_kernel(self) -> Result<KernelType, ArrowError>

Available on (crate features default-engine-native-tls or default-engine-rustls or arrow-conversion) and crate feature arrow-conversion only.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,