pub struct Snapshot { /* private fields */ }Expand description
In-memory representation of a specific snapshot of a Delta table. While a DeltaTable exists
throughout time, Snapshots represent a view of a table at a specific point in time; they
have a defined schema (which may change over time for any given table), specific version, and
frozen log segment.
Implementations§
Source§impl Snapshot
impl Snapshot
Sourcepub fn builder_for(table_root: impl AsRef<str>) -> SnapshotBuilder
pub fn builder_for(table_root: impl AsRef<str>) -> SnapshotBuilder
Create a new SnapshotBuilder to build a new Snapshot for a given table root. If you
instead have an existing Snapshot you would like to do minimal work to update, consider
using Snapshot::builder_from instead.
Sourcepub fn builder_from(existing_snapshot: SnapshotRef) -> SnapshotBuilder
pub fn builder_from(existing_snapshot: SnapshotRef) -> SnapshotBuilder
Create a new SnapshotBuilder to incrementally update an existing Snapshot to a
more recent version.
See Snapshot::try_new_from for the case-by-case behavior.
Sourcepub fn new(
log_segment: LogSegment,
table_configuration: TableConfiguration,
) -> DeltaResult<Self>
Available on crate feature internal-api only.
pub fn new( log_segment: LogSegment, table_configuration: TableConfiguration, ) -> DeltaResult<Self>
internal-api only.Create a new Snapshot from a LogSegment and TableConfiguration.
Sourcepub fn log_segment(&self) -> &LogSegment
Available on crate feature internal-api only.
pub fn log_segment(&self) -> &LogSegment
internal-api only.Log segment this snapshot uses
Sourcepub fn crc(&self) -> Option<&Arc<Crc>>
Available on crate feature internal-api only.
pub fn crc(&self) -> Option<&Arc<Crc>>
internal-api only.Returns the CRC for this snapshot, if one is resolved.
When Some(crc), crc.version == self.version() and queries backed by the CRC hit
cache at zero I/O.
pub fn table_root(&self) -> &Url
Sourcepub fn estimated_owned_heap_size_bytes(&self) -> usize
pub fn estimated_owned_heap_size_bytes(&self) -> usize
Estimated owned heap size in bytes for this snapshot. Best-effort estimate for capacity tracking, not authoritative.
Counts only the dominant per-snapshot heap contributors, normally > 70% of the snapshot’s owned heap size:
- For every listed log path (commit, compaction, checkpoint, latest CRC, latest commit): the filename / extension / Url string heap.
- Vec buffer capacity (
capacity * size_of::<ParsedLogPath>()) for the three Vec fields onLogSegmentFiles. - The log root Url string.
- The raw
schemaStringJSON on table metadata.
The Arc-shared variables (e.g. logical/physical schemas, crc) are not counted,
as they can be shared between multiple snapshots and are not owned by a single snapshot.
Other variables’ contributions to heap size are relatively small, so they are not counted here.
Runs in O(n) over listed log files.
Sourcepub fn table_properties(&self) -> &TableProperties
pub fn table_properties(&self) -> &TableProperties
Get the TableProperties for this Snapshot.
Sourcepub fn get_protocol_derived_properties(&self) -> HashMap<String, String>
Available on crate feature internal-api only.
pub fn get_protocol_derived_properties(&self) -> HashMap<String, String>
internal-api only.Returns the protocol-derived table properties as a map of key-value pairs.
This includes:
delta.minReaderVersionanddelta.minWriterVersiondelta.feature.<name> = "supported"for each reader and writer feature (when using table features protocol, i.e. reader version 3 / writer version 7)
Sourcepub fn metadata_configuration(&self) -> &HashMap<String, String>
Available on crate feature internal-api only.
pub fn metadata_configuration(&self) -> &HashMap<String, String>
internal-api only.Get the raw metadata configuration for this table.
This returns the Metadata.configuration map as stored in the Delta log, containing
user-defined properties, delta table properties (e.g., delta.enableInCommitTimestamps),
and application-specific properties (e.g., io.unitycatalog.tableId).
Sourcepub fn table_configuration(&self) -> &TableConfiguration
Available on crate feature internal-api only.
pub fn table_configuration(&self) -> &TableConfiguration
internal-api only.Get the TableConfiguration for this Snapshot.
Sourcepub fn get_app_id_version(
&self,
application_id: &str,
engine: &dyn Engine,
) -> DeltaResult<Option<i64>>
pub fn get_app_id_version( &self, application_id: &str, engine: &dyn Engine, ) -> DeltaResult<Option<i64>>
Fetch the latest version of the provided application_id for this snapshot. Filters the
txn based on the delta.setTransactionRetentionDuration property and lastUpdated.
Uses the CRC fast path when available, otherwise falls back to log replay.
Reports metrics: SetTransactionLoaded.
Sourcepub fn get_domain_metadata(
&self,
domain: &str,
engine: &dyn Engine,
) -> DeltaResult<Option<String>>
pub fn get_domain_metadata( &self, domain: &str, engine: &dyn Engine, ) -> DeltaResult<Option<String>>
Fetch the domainMetadata for a specific domain in this snapshot. This returns the latest configuration for the domain, or None if the domain does not exist.
Note that this method performs log replay (fetches and processes metadata from storage).
Sourcepub fn get_logical_clustering_columns(
&self,
engine: &dyn Engine,
) -> DeltaResult<Option<Vec<ColumnName>>>
Available on crate feature internal-api only.
pub fn get_logical_clustering_columns( &self, engine: &dyn Engine, ) -> DeltaResult<Option<Vec<ColumnName>>>
internal-api only.Get the logical clustering columns for this snapshot, if clustering is enabled.
Returns Ok(Some(columns)) if the ClusteredTable feature is enabled and clustering
columns are defined, Ok(None) if clustering is not enabled, or an error if the
clustering metadata is malformed.
The columns are returned as logical ColumnNames. When column mapping is enabled,
this converts the physical names stored in domain metadata back to logical names using
the table schema.
Note that this method performs log replay (fetches and processes metadata from storage).
§Errors
Returns an error if the clustering domain metadata is malformed, or if a physical column name cannot be resolved to a logical name in the schema.
Sourcepub fn get_physical_clustering_columns(
&self,
engine: &dyn Engine,
) -> DeltaResult<Option<Vec<ColumnName>>>
Available on crate feature internal-api only.
pub fn get_physical_clustering_columns( &self, engine: &dyn Engine, ) -> DeltaResult<Option<Vec<ColumnName>>>
internal-api only.Get the clustering columns for this snapshot, if the table has clustering enabled.
Returns Ok(Some(columns)) if the ClusteredTable feature is enabled and clustering
columns are defined, Ok(None) if clustering is not enabled, or an error if the
clustering metadata is malformed.
The columns are returned as physical column names, respecting the column mapping mode. Note that this method performs log replay (fetches and processes metadata from storage).
Sourcepub fn get_domain_metadatas_internal(
&self,
engine: &dyn Engine,
domains: Option<&HashSet<&str>>,
) -> DeltaResult<HashMap<String, DomainMetadata>>
Available on crate feature internal-api only.
pub fn get_domain_metadatas_internal( &self, engine: &dyn Engine, domains: Option<&HashSet<&str>>, ) -> DeltaResult<HashMap<String, DomainMetadata>>
internal-api only.Load domain metadata: if Complete in the CRC, answer from the cache; else if every
requested domain is in a Partial cache, also answer from the cache; else full log
replay. domains == None means load all.
Reports metrics: DomainMetadataLoaded.
Sourcepub fn get_domain_metadata_internal(
&self,
domain: &str,
engine: &dyn Engine,
) -> DeltaResult<Option<String>>
Available on crate feature internal-api only.
pub fn get_domain_metadata_internal( &self, domain: &str, engine: &dyn Engine, ) -> DeltaResult<Option<String>>
internal-api only.Fetch both user-controlled and system-controlled domain metadata for a specific domain in this snapshot.
Returns the latest configuration for the domain, or None if the domain does not exist
(or was removed). Unlike Snapshot::get_domain_metadata, this does not reject delta.*
domains.
Sourcepub fn get_all_domain_metadata(
&self,
engine: &dyn Engine,
) -> DeltaResult<Vec<DomainMetadata>>
Available on crate feature internal-api only.
pub fn get_all_domain_metadata( &self, engine: &dyn Engine, ) -> DeltaResult<Vec<DomainMetadata>>
internal-api only.Fetch all non-internal domain metadata for this snapshot as a Vec.
Internal (delta.*) domains are filtered out.
Sourcepub fn get_file_stats_if_present(&self) -> Option<FileStats>
pub fn get_file_stats_if_present(&self) -> Option<FileStats>
Returns file-level statistics, or None if this snapshot has no CRC, or its CRC does
not have Complete file stats. Performs no I/O (the CRC is resolved at construction).
Sourcepub fn get_in_commit_timestamp(
&self,
engine: &dyn Engine,
) -> DeltaResult<Option<i64>>
Available on crate feature internal-api only.
pub fn get_in_commit_timestamp( &self, engine: &dyn Engine, ) -> DeltaResult<Option<i64>>
internal-api only.Get the In-Commit Timestamp (ICT) for this snapshot.
Returns the inCommitTimestamp from the CommitInfo action of the commit that created this
snapshot.
§Returns
Ok(Some(timestamp))- ICT is enabled and available for this versionOk(None)- ICT is not enabledErr(...)- ICT is enabled but cannot be read, or enablement version is invalid
Sourcepub fn get_timestamp(&self, engine: &dyn Engine) -> DeltaResult<i64>
pub fn get_timestamp(&self, engine: &dyn Engine) -> DeltaResult<i64>
Get the timestamp for this snapshot’s version, in milliseconds since the Unix epoch.
When In-Commit Timestamp (ICT) are enabled, returns the In-Commit Timestamp value. Otherwise, falls back to the filesystem last-modified time of the latest commit file.
Returns an error if the commit file is missing, the ICT configuration is invalid, or the ICT value cannot be read.
See also get_in_commit_timestamp for ICT-only semantics.
Sourcepub fn scan_builder(self: Arc<Self>) -> ScanBuilder
pub fn scan_builder(self: Arc<Self>) -> ScanBuilder
Create a ScanBuilder for an SnapshotRef.
Sourcepub fn incremental_scan_builder(
self: Arc<Self>,
base_version: Version,
) -> IncrementalScanBuilder
pub fn incremental_scan_builder( self: Arc<Self>, base_version: Version, ) -> IncrementalScanBuilder
Create an IncrementalScanBuilder for the range (base_version, self.version()].
Use this to advance a cached file listing from base_version to this snapshot’s
version without doing a full scan. See IncrementalScanBuilder for details.
Sourcepub fn transaction(
self: Arc<Self>,
committer: Box<dyn Committer>,
engine: &dyn Engine,
) -> DeltaResult<Transaction>
pub fn transaction( self: Arc<Self>, committer: Box<dyn Committer>, engine: &dyn Engine, ) -> DeltaResult<Transaction>
Create a Transaction for this SnapshotRef. With the specified Committer.
Note: For tables with clustering enabled, this performs log replay to read clustering columns from domain metadata, which may have a performance cost.
Sourcepub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder
pub fn alter_table(self: Arc<Self>) -> AlterTableTransactionBuilder
Creates a builder for altering this table’s metadata. Currently supports schema change operations.
The returned builder allows chaining operations before building an
AlterTableTransaction that can be committed.
Sourcepub fn create_checkpoint_writer(
self: Arc<Self>,
) -> DeltaResult<CheckpointWriter>
pub fn create_checkpoint_writer( self: Arc<Self>, ) -> DeltaResult<CheckpointWriter>
Creates a CheckpointWriter for generating a checkpoint from this snapshot.
See the crate::checkpoint module documentation for more details on checkpoint types
and the overall checkpoint process.
Sourcepub fn log_compaction_writer(
self: Arc<Self>,
start_version: Version,
end_version: Version,
) -> DeltaResult<LogCompactionWriter>
pub fn log_compaction_writer( self: Arc<Self>, start_version: Version, end_version: Version, ) -> DeltaResult<LogCompactionWriter>
Creates a LogCompactionWriter for generating a log compaction file.
Log compaction aggregates commit files in a version range into a single compacted file, improving performance by reducing the number of files to process during log replay.
§Parameters
start_version: The first version to include in the compaction (inclusive)end_version: The last version to include in the compaction (inclusive)
§Returns
A LogCompactionWriter that can be used to generate the compaction file.
NOTE: This method is currently a no-op because log compaction is disabled (#2337)
Sourcepub fn write_checksum(
self: &SnapshotRef,
engine: &dyn Engine,
) -> DeltaResult<(ChecksumWriteResult, SnapshotRef)>
pub fn write_checksum( self: &SnapshotRef, engine: &dyn Engine, ) -> DeltaResult<(ChecksumWriteResult, SnapshotRef)>
Writes a version checksum (CRC) file for this snapshot. Writers should call this after every commit because checksums enable faster snapshot loading and table state validation.
Currently only supports writing from a post-commit snapshot that has pre-computed CRC
information in memory (i.e. the snapshot returned by
CommittedTransaction::post_commit_snapshot).
Returns a tuple of ChecksumWriteResult and a SnapshotRef. On
ChecksumWriteResult::Written, the returned snapshot has the CRC file recorded in
its log segment. On ChecksumWriteResult::AlreadyExists, the original snapshot is
returned unchanged.
§Errors
Error::ChecksumWriteUnsupportedif no in-memory CRC is available at this snapshot’s version (e.g. a snapshot loaded from disk that has no CRC file), if the CRC’sfile_stats_stateisIndeterminate(a non-incremental operation like ANALYZE STATS was encountered, or a file action had a missing size; recoverable with a full state reconstruction in the future), or ifdelta.enableInCommitTimestampsistruebutinCommitTimestampOptis absent.- I/O errors from the engine’s storage handler if the write fails.
Sourcepub fn checkpoint(
self: &SnapshotRef,
engine: &dyn Engine,
spec: Option<&CheckpointSpec>,
) -> DeltaResult<(CheckpointWriteResult, SnapshotRef)>
pub fn checkpoint( self: &SnapshotRef, engine: &dyn Engine, spec: Option<&CheckpointSpec>, ) -> DeltaResult<(CheckpointWriteResult, SnapshotRef)>
Performs a complete checkpoint of this snapshot using the provided engine.
If a checkpoint already exists at this version, returns
CheckpointWriteResult::AlreadyExists with the original snapshot unchanged.
Otherwise, writes a checkpoint parquet file and the _last_checkpoint file and returns
CheckpointWriteResult::Written with an updated SnapshotRef whose log segment
reflects the new checkpoint. Commits and compaction files subsumed by the checkpoint are
dropped from the returned snapshot.
§Parameters
engine: Engine for data processing and I/Ospec: Checkpoint format specification.Noneuses the default checkpoint settings (auto-detecting V1/V2 from table features). For V2 checkpoints, the default is to not write sidecar files.
§Errors
- If
CheckpointSpec::V2is used but the table does not support thev2Checkpointfeature. - If
CheckpointSpec::V1is used but the table supportsv2Checkpointfeature. Note: the Delta protocol permits writing V1 checkpoints to such tables; this is a kernel limitation. - If
file_actions_per_sidecar_hintisSome(0). - If the checkpoint write fails (e.g. I/O, parquet write). A
FileAlreadyExistserror is not propagated; it returnsCheckpointWriteResult::AlreadyExistsinstead. Note: this also fires on the (unlikely) case of a sidecar UUID filename collision, where it should ideally surface as an error. Tracked in https://github.com/delta-io/delta-kernel-rs/issues/2503.
Note:
- It is still possible that an existing checkpoint gets overwritten if that checkpoint
was written by a concurrent writer.
- This function uses crate::ParquetHandler::write_parquet_file and
crate::StorageHandler::head, which may not be implemented by all engines. If you
are using the default engine, make sure to build it with the multi-threaded executor
if you want to use this method.
Note: There is currently no public api for callers to determine whether a table supports V2 checkpoints directly. Tracked in https://github.com/delta-io/delta-kernel-rs/issues/2450.
Sourcepub fn publish(
self: &SnapshotRef,
engine: &dyn Engine,
committer: &dyn Committer,
) -> DeltaResult<SnapshotRef>
pub fn publish( self: &SnapshotRef, engine: &dyn Engine, committer: &dyn Committer, ) -> DeltaResult<SnapshotRef>
Publishes all catalog commits at this table version. Applicable only to catalog-managed tables. This method is a no-op for filesystem-managed tables or if there are no catalog commits to publish.
Publishing copies ratified catalog commits to the Delta log as published Delta files, reducing catalog storage requirements and enabling some table maintenance operations, like checkpointing.
§Parameters
engine: The engine to use for publishing commits
§Errors
Returns an error if the publish operation fails, or if there are catalog commits that need publishing but the table or committer don’t support publishing.
§See Also
Trait Implementations§
impl Eq for Snapshot
Auto Trait Implementations§
impl !RefUnwindSafe for Snapshot
impl !UnwindSafe for Snapshot
impl Freeze for Snapshot
impl Send for Snapshot
impl Sync for Snapshot
impl Unpin for Snapshot
impl UnsafeUnpin for Snapshot
Blanket Implementations§
Source§impl<T> AsAny for T
impl<T> AsAny for T
Source§fn any_ref(&self) -> &(dyn Any + Sync + Send + 'static)
fn any_ref(&self) -> &(dyn Any + Sync + Send + 'static)
dyn Any reference to the object: Read moreSource§fn as_any(self: Arc<T>) -> Arc<dyn Any + Sync + Send> ⓘ
fn as_any(self: Arc<T>) -> Arc<dyn Any + Sync + Send> ⓘ
Arc<dyn Any> reference to the object: Read moreSource§fn into_any(self: Box<T>) -> Box<dyn Any + Sync + Send>
fn into_any(self: Box<T>) -> Box<dyn Any + Sync + Send>
Box<dyn Any>: Read moreSource§fn type_name(&self) -> &'static str
fn type_name(&self) -> &'static str
std::any::type_name, since Any does not provide it and
Any::type_id is useless as a debugging aid (its Debug is just a mess of hex digits).Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> DynPartialEq for T
impl<T> DynPartialEq for T
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<KernelType, ArrowType> TryIntoArrow<ArrowType> for KernelTypewhere
ArrowType: TryFromKernel<KernelType>,
impl<KernelType, ArrowType> TryIntoArrow<ArrowType> for KernelTypewhere
ArrowType: TryFromKernel<KernelType>,
Source§fn try_into_arrow(self) -> Result<ArrowType, ArrowError>
fn try_into_arrow(self) -> Result<ArrowType, ArrowError>
arrow-conversion and (crate features arrow-conversion or declarative-plans or default-engine-native-tls or default-engine-rustls) only.Source§impl<KernelType, ArrowType> TryIntoKernel<KernelType> for ArrowTypewhere
KernelType: TryFromArrow<ArrowType>,
impl<KernelType, ArrowType> TryIntoKernel<KernelType> for ArrowTypewhere
KernelType: TryFromArrow<ArrowType>,
Source§fn try_into_kernel(self) -> Result<KernelType, ArrowError>
fn try_into_kernel(self) -> Result<KernelType, ArrowError>
arrow-conversion and (crate features arrow-conversion or declarative-plans or default-engine-native-tls or default-engine-rustls) only.