1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
//! Internal synthetic file identifier utilities.
//!
//! `file_id` is an internal correlation mechanism used by the DataFusion integration to associate
//! rows with their source file (e.g. per-file transforms, deletion vectors, and matched-file DML
//! planning). It is intentionally centralized in this module to minimize schema/type drift.
//!
//! TODO(delta-io/delta-rs#4115): When ParquetAccessPlans can carry per-file transforms and DV
//! filtering directly into the Parquet scan (and DV semantics become order-insensitive), this
//! synthetic column should become unnecessary and can be removed.
use Arc;
use ;
use ScalarValue;
use ;
/// Default column name for the synthetic file identifier.
///
/// This column is used internally to correlate rows back to their source file so we can apply
/// per-file transforms (e.g. column mapping, deletion vectors) and to support DML rewrite scans.
pub const FILE_ID_COLUMN_DEFAULT: &str = "__delta_rs_file_id__";
/// Canonical Arrow type for the synthetic file-id column.
///
/// We keep this aligned with DataFusion's recommended dictionary encoding for partition values
/// (`wrap_partition_type_in_dict`) so that both partition materialization and literal construction
/// (`wrap_partition_value_in_dict`) agree on the dictionary key type (currently `UInt16`).
///
/// Note: we intentionally use `Utf8` (not `Utf8View`) because Arrow dictionary packing does not
/// support view types.
pub
/// Construct the canonical `file_id` field.
pub
/// Wrap a file path in the canonical dictionary encoding used for `file_id` values.
pub