Skip to main content

MigrationJob

Struct MigrationJob 

Source
pub struct MigrationJob {
    pub table: TableIdent,
    pub old_column: String,
    pub new_column: String,
    pub text_column: String,
    pub embed_fn: EmbedFn,
    pub strategy: MigrationStrategy,
    pub batch_size: usize,
    pub new_model: Option<EmbeddingModelInfo>,
    pub on_progress: Option<ProgressFn>,
}
Expand description

Migrates embedding columns in an AI-Lake table to a new model.

Usage:

let job = MigrationJob {
    table: TableIdent::new("default", "docs"),
    old_column: "embedding".to_string(),
    new_column: "embedding_v2".to_string(),
    text_column: "chunk_text".to_string(),
    embed_fn: Arc::new(|texts| Ok(my_model.encode(texts))),
    strategy: MigrationStrategy::DualWriteThenCutover,
    batch_size: 10_000,
    new_model: Some(EmbeddingModelInfo::new("my-model-v2")),
    on_progress: None,
};
job.run(catalog, store).await?;

Fields§

§table: TableIdent§old_column: String

Name of the embedding column to replace (e.g., “embedding”).

§new_column: String

Name to give the new embedding column (e.g., “embedding_v2”). Can be the same as old_column to do an in-place model upgrade.

§text_column: String

Column in the Parquet files that holds the text to re-embed. Defaults to chunk_text (the LlmContextSchema canonical name).

§embed_fn: EmbedFn

Callable that converts a slice of texts to embeddings. Must return exactly texts.len() vectors, all of the same dimension.

§strategy: MigrationStrategy§batch_size: usize

How many rows to embed per embed_fn call. Tune based on model batch size.

§new_model: Option<EmbeddingModelInfo>

Metadata for the new embedding model — stored in Iceberg properties after migration.

§on_progress: Option<ProgressFn>

Optional callback called after each file completes.

Implementations§

Source§

impl MigrationJob

Source

pub async fn run( self, catalog: Arc<dyn CatalogProvider>, store: Arc<dyn Store>, ) -> AilakeResult<()>

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more