Skip to main content

Snapshot

Struct Snapshot 

Source
pub struct Snapshot { /* private fields */ }
Expand description

In-memory representation of a specific snapshot of a Delta table. While a DeltaTable exists throughout time, Snapshots represent a view of a table at a specific point in time; they have a defined schema (which may change over time for any given table), specific version, and frozen log segment.

Implementations§

Source§

impl Snapshot

Source

pub fn builder_for(table_root: impl AsRef<str>) -> SnapshotBuilder

Create a new SnapshotBuilder to build a new Snapshot for a given table root. If you instead have an existing Snapshot you would like to do minimal work to update, consider using Snapshot::builder_from instead.

Source

pub fn builder_from(existing_snapshot: SnapshotRef) -> SnapshotBuilder

Create a new SnapshotBuilder to incrementally update an existing Snapshot to a more recent version.

See Snapshot::try_new_from for the case-by-case behavior.

Source

pub fn new( log_segment: LogSegment, table_configuration: TableConfiguration, ) -> DeltaResult<Self>

Available on crate feature internal-api only.

Create a new Snapshot from a LogSegment and TableConfiguration.

Source

pub fn log_segment(&self) -> &LogSegment

Available on crate feature internal-api only.

Log segment this snapshot uses

Source

pub fn crc(&self) -> Option<&Arc<Crc>>

Available on crate feature internal-api only.

Returns the CRC for this snapshot, if one is resolved.

When Some(crc), crc.version == self.version() and queries backed by the CRC hit cache at zero I/O.

Source

pub fn table_root(&self) -> &Url

Source

pub fn version(&self) -> Version

Version of this Snapshot in the table.

Source

pub fn schema(&self) -> SchemaRef

Table Schema at this Snapshots version.

Source

pub fn estimated_owned_heap_size_bytes(&self) -> usize

Estimated owned heap size in bytes for this snapshot. Best-effort estimate for capacity tracking, not authoritative.

Counts only the dominant per-snapshot heap contributors, normally > 70% of the snapshot’s owned heap size:

  • For every listed log path (commit, compaction, checkpoint, latest CRC, latest commit): the filename / extension / Url string heap.
  • Vec buffer capacity (capacity * size_of::<ParsedLogPath>()) for the three Vec fields on LogSegmentFiles.
  • The log root Url string.
  • The raw schemaString JSON on table metadata.

The Arc-shared variables (e.g. logical/physical schemas, crc) are not counted, as they can be shared between multiple snapshots and are not owned by a single snapshot.

Other variables’ contributions to heap size are relatively small, so they are not counted here.

Runs in O(n) over listed log files.

Source

pub fn table_properties(&self) -> &TableProperties

Get the TableProperties for this Snapshot.

Source

pub fn get_protocol_derived_properties(&self) -> HashMap<String, String>

Available on crate feature internal-api only.

Returns the protocol-derived table properties as a map of key-value pairs.

This includes:

  • delta.minReaderVersion and delta.minWriterVersion
  • delta.feature.<name> = "supported" for each reader and writer feature (when using table features protocol, i.e. reader version 3 / writer version 7)
Source

pub fn metadata_configuration(&self) -> &HashMap<String, String>

Available on crate feature internal-api only.

Get the raw metadata configuration for this table.

This returns the Metadata.configuration map as stored in the Delta log, containing user-defined properties, delta table properties (e.g., delta.enableInCommitTimestamps), and application-specific properties (e.g., io.unitycatalog.tableId).

Source

pub fn table_configuration(&self) -> &TableConfiguration

Available on crate feature internal-api only.

Get the TableConfiguration for this Snapshot.

Source

pub fn get_app_id_version( &self, application_id: &str, engine: &dyn Engine, ) -> DeltaResult<Option<i64>>

Fetch the latest version of the provided application_id for this snapshot. Filters the txn based on the delta.setTransactionRetentionDuration property and lastUpdated.

Uses the CRC fast path when available, otherwise falls back to log replay.

Reports metrics: SetTransactionLoaded.

Source

pub fn get_domain_metadata( &self, domain: &str, engine: &dyn Engine, ) -> DeltaResult<Option<String>>

Fetch the domainMetadata for a specific domain in this snapshot. This returns the latest configuration for the domain, or None if the domain does not exist.

Note that this method performs log replay (fetches and processes metadata from storage).

Source

pub fn get_logical_clustering_columns( &self, engine: &dyn Engine, ) -> DeltaResult<Option<Vec<ColumnName>>>

Available on crate feature internal-api only.

Get the logical clustering columns for this snapshot, if clustering is enabled.

Returns Ok(Some(columns)) if the ClusteredTable feature is enabled and clustering columns are defined, Ok(None) if clustering is not enabled, or an error if the clustering metadata is malformed.

The columns are returned as logical ColumnNames. When column mapping is enabled, this converts the physical names stored in domain metadata back to logical names using the table schema.

Note that this method performs log replay (fetches and processes metadata from storage).

§Errors

Returns an error if the clustering domain metadata is malformed, or if a physical column name cannot be resolved to a logical name in the schema.

Source

pub fn get_physical_clustering_columns( &self, engine: &dyn Engine, ) -> DeltaResult<Option<Vec<ColumnName>>>

Available on crate feature internal-api only.

Get the clustering columns for this snapshot, if the table has clustering enabled.

Returns Ok(Some(columns)) if the ClusteredTable feature is enabled and clustering columns are defined, Ok(None) if clustering is not enabled, or an error if the clustering metadata is malformed.

The columns are returned as physical column names, respecting the column mapping mode. Note that this method performs log replay (fetches and processes metadata from storage).

Source

pub fn get_domain_metadatas_internal( &self, engine: &dyn Engine, domains: Option<&HashSet<&str>>, ) -> DeltaResult<HashMap<String, DomainMetadata>>

Available on crate feature internal-api only.

Load domain metadata: if Complete in the CRC, answer from the cache; else if every requested domain is in a Partial cache, also answer from the cache; else full log replay. domains == None means load all.

Reports metrics: DomainMetadataLoaded.

Source

pub fn get_domain_metadata_internal( &self, domain: &str, engine: &dyn Engine, ) -> DeltaResult<Option<String>>

Available on crate feature internal-api only.

Fetch both user-controlled and system-controlled domain metadata for a specific domain in this snapshot.

Returns the latest configuration for the domain, or None if the domain does not exist (or was removed). Unlike Snapshot::get_domain_metadata, this does not reject delta.* domains.

Source

pub fn get_all_domain_metadata( &self, engine: &dyn Engine, ) -> DeltaResult<Vec<DomainMetadata>>

Available on crate feature internal-api only.

Fetch all non-internal domain metadata for this snapshot as a Vec.

Internal (delta.*) domains are filtered out.

Source

pub fn get_file_stats_if_present(&self) -> Option<FileStats>

Returns file-level statistics, or None if this snapshot has no CRC, or its CRC does not have Complete file stats. Performs no I/O (the CRC is resolved at construction).

Source

pub fn get_in_commit_timestamp( &self, engine: &dyn Engine, ) -> DeltaResult<Option<i64>>

Available on crate feature internal-api only.

Get the In-Commit Timestamp (ICT) for this snapshot.

Returns the inCommitTimestamp from the CommitInfo action of the commit that created this snapshot.

§Returns
  • Ok(Some(timestamp)) - ICT is enabled and available for this version
  • Ok(None) - ICT is not enabled
  • Err(...) - ICT is enabled but cannot be read, or enablement version is invalid
Source

pub fn get_timestamp(&self, engine: &dyn Engine) -> DeltaResult<i64>

Get the timestamp for this snapshot’s version, in milliseconds since the Unix epoch.

When In-Commit Timestamp (ICT) are enabled, returns the In-Commit Timestamp value. Otherwise, falls back to the filesystem last-modified time of the latest commit file.

Returns an error if the commit file is missing, the ICT configuration is invalid, or the ICT value cannot be read.

See also get_in_commit_timestamp for ICT-only semantics.

Source

pub fn scan_builder(self: Arc<Self>) -> ScanBuilder

Create a ScanBuilder for an SnapshotRef.

Source

pub fn incremental_scan_builder( self: Arc<Self>, base_version: Version, ) -> IncrementalScanBuilder

Create an IncrementalScanBuilder for the range (base_version, self.version()].

Use this to advance a cached file listing from base_version to this snapshot’s version without doing a full scan. See IncrementalScanBuilder for details.

Source

pub fn transaction( self: Arc<Self>, committer: Box<dyn Committer>, engine: &dyn Engine, ) -> DeltaResult<Transaction>

Create a Transaction for this SnapshotRef. With the specified Committer.

Note: For tables with clustering enabled, this performs log replay to read clustering columns from domain metadata, which may have a performance cost.

Source

pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder

Creates a builder for altering this table’s metadata. Currently supports schema change operations.

The returned builder allows chaining operations before building an AlterTableTransaction that can be committed.

Source

pub fn create_checkpoint_writer( self: Arc<Self>, ) -> DeltaResult<CheckpointWriter>

Creates a CheckpointWriter for generating a checkpoint from this snapshot.

See the crate::checkpoint module documentation for more details on checkpoint types and the overall checkpoint process.

Source

pub fn log_compaction_writer( self: Arc<Self>, start_version: Version, end_version: Version, ) -> DeltaResult<LogCompactionWriter>

Creates a LogCompactionWriter for generating a log compaction file.

Log compaction aggregates commit files in a version range into a single compacted file, improving performance by reducing the number of files to process during log replay.

§Parameters
  • start_version: The first version to include in the compaction (inclusive)
  • end_version: The last version to include in the compaction (inclusive)
§Returns

A LogCompactionWriter that can be used to generate the compaction file.

NOTE: This method is currently a no-op because log compaction is disabled (#2337)

Source

pub fn write_checksum( self: &SnapshotRef, engine: &dyn Engine, ) -> DeltaResult<(ChecksumWriteResult, SnapshotRef)>

Writes a version checksum (CRC) file for this snapshot. Writers should call this after every commit because checksums enable faster snapshot loading and table state validation.

Currently only supports writing from a post-commit snapshot that has pre-computed CRC information in memory (i.e. the snapshot returned by CommittedTransaction::post_commit_snapshot).

Returns a tuple of ChecksumWriteResult and a SnapshotRef. On ChecksumWriteResult::Written, the returned snapshot has the CRC file recorded in its log segment. On ChecksumWriteResult::AlreadyExists, the original snapshot is returned unchanged.

§Errors
  • Error::ChecksumWriteUnsupported if no in-memory CRC is available at this snapshot’s version (e.g. a snapshot loaded from disk that has no CRC file), if the CRC’s file_stats_state is Indeterminate (a non-incremental operation like ANALYZE STATS was encountered, or a file action had a missing size; recoverable with a full state reconstruction in the future), or if delta.enableInCommitTimestamps is true but inCommitTimestampOpt is absent.
  • I/O errors from the engine’s storage handler if the write fails.
Source

pub fn checkpoint( self: &SnapshotRef, engine: &dyn Engine, spec: Option<&CheckpointSpec>, ) -> DeltaResult<(CheckpointWriteResult, SnapshotRef)>

Performs a complete checkpoint of this snapshot using the provided engine.

If a checkpoint already exists at this version, returns CheckpointWriteResult::AlreadyExists with the original snapshot unchanged. Otherwise, writes a checkpoint parquet file and the _last_checkpoint file and returns CheckpointWriteResult::Written with an updated SnapshotRef whose log segment reflects the new checkpoint. Commits and compaction files subsumed by the checkpoint are dropped from the returned snapshot.

§Parameters
  • engine: Engine for data processing and I/O
  • spec: Checkpoint format specification. None uses the default checkpoint settings (auto-detecting V1/V2 from table features). For V2 checkpoints, the default is to not write sidecar files.
§Errors
  • If CheckpointSpec::V2 is used but the table does not support the v2Checkpoint feature.
  • If CheckpointSpec::V1 is used but the table supports v2Checkpoint feature. Note: the Delta protocol permits writing V1 checkpoints to such tables; this is a kernel limitation.
  • If file_actions_per_sidecar_hint is Some(0).
  • If the checkpoint write fails (e.g. I/O, parquet write). A FileAlreadyExists error is not propagated; it returns CheckpointWriteResult::AlreadyExists instead. Note: this also fires on the (unlikely) case of a sidecar UUID filename collision, where it should ideally surface as an error. Tracked in https://github.com/delta-io/delta-kernel-rs/issues/2503.

Note: - It is still possible that an existing checkpoint gets overwritten if that checkpoint was written by a concurrent writer. - This function uses crate::ParquetHandler::write_parquet_file and crate::StorageHandler::head, which may not be implemented by all engines. If you are using the default engine, make sure to build it with the multi-threaded executor if you want to use this method.

Note: There is currently no public api for callers to determine whether a table supports V2 checkpoints directly. Tracked in https://github.com/delta-io/delta-kernel-rs/issues/2450.

Source

pub fn publish( self: &SnapshotRef, engine: &dyn Engine, committer: &dyn Committer, ) -> DeltaResult<SnapshotRef>

Publishes all catalog commits at this table version. Applicable only to catalog-managed tables. This method is a no-op for filesystem-managed tables or if there are no catalog commits to publish.

Publishing copies ratified catalog commits to the Delta log as published Delta files, reducing catalog storage requirements and enabling some table maintenance operations, like checkpointing.

§Parameters
  • engine: The engine to use for publishing commits
§Errors

Returns an error if the publish operation fails, or if there are catalog commits that need publishing but the table or committer don’t support publishing.

§See Also

Trait Implementations§

Source§

impl Debug for Snapshot

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Drop for Snapshot

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more
Source§

impl Eq for Snapshot

Source§

impl PartialEq for Snapshot

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> AsAny for T
where T: Any + Send + Sync,

Source§

fn any_ref(&self) -> &(dyn Any + Sync + Send + 'static)

Obtains a dyn Any reference to the object: Read more
Source§

fn as_any(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Obtains an Arc<dyn Any> reference to the object: Read more
Source§

fn into_any(self: Box<T>) -> Box<dyn Any + Sync + Send>

Converts the object to Box<dyn Any>: Read more
Source§

fn type_name(&self) -> &'static str

Convenient wrapper for std::any::type_name, since Any does not provide it and Any::type_id is useless as a debugging aid (its Debug is just a mess of hex digits).
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> DynPartialEq for T
where T: PartialEq + AsAny,

Source§

fn dyn_eq(&self, other: &(dyn Any + 'static)) -> bool

Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Sized + Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<KernelType, ArrowType> TryIntoArrow<ArrowType> for KernelType
where ArrowType: TryFromKernel<KernelType>,

Source§

fn try_into_arrow(self) -> Result<ArrowType, ArrowError>

Available on crate feature arrow-conversion and (crate features arrow-conversion or declarative-plans or default-engine-native-tls or default-engine-rustls) only.
Source§

impl<KernelType, ArrowType> TryIntoKernel<KernelType> for ArrowType
where KernelType: TryFromArrow<ArrowType>,

Source§

fn try_into_kernel(self) -> Result<KernelType, ArrowError>

Available on crate feature arrow-conversion and (crate features arrow-conversion or declarative-plans or default-engine-native-tls or default-engine-rustls) only.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more