Skip to main content

DuckDbEngine

Struct DuckDbEngine 

Source
pub struct DuckDbEngine { /* private fields */ }
Expand description

DuckDB-backed OLAP engine with a single write connection and a concurrent read pool.

§Connection pool

ConnectionCountProtected byUsed for
write_conn1MutexDDL + DML (INSERT/UPDATE/DELETE/CREATE/ALTER)
read_poolN (default 4)per-slot MutexConcurrent SELECT via round-robin

All read-pool connections are obtained with Connection::try_clone() from the initial write connection; they share the same underlying DuckDB database and benefit from DuckDB’s MVCC for isolated concurrent reads.

§Dispatching

Every operation is dispatched to a tokio::task::spawn_blocking closure so that blocking DuckDB calls never stall the async executor. The Arc<Mutex> wrapper is cloned before the closure is spawned, keeping all lifetimes 'static.

§Arrow ingestion

DuckDbEngine::load_arrow uses DuckDB’s native Appender API (Appender::append_record_batch) for zero-copy bulk loading. This preserves all Arrow types without SQL literal serialization.

§Thread-safety

duckdb::Connection is !Send. The unsafe impl Send + Sync on this type is sound because every connection is accessed only inside a Mutex guard, and the Mutex enforces the single-thread-at-a-time invariant that DuckDB’s C++ layer requires. See the crate-level safety note for the full rationale.

Implementations§

Source§

impl DuckDbEngine

Source

pub fn in_memory() -> Result<Self, DuckDbError>

Create a pure in-memory DuckDB instance with the default read pool size of 4 connections.

In-memory databases are ephemeral: all data is lost when the engine is dropped. Useful for tests and short-lived analytical sessions.

Source

pub fn in_memory_with_pool(read_pool_size: usize) -> Result<Self, DuckDbError>

Create a pure in-memory DuckDB instance with a custom read pool size.

read_pool_size is clamped to a minimum of 1 — a pool with zero readers would deadlock on the first query.

Source

pub fn persistent(path: &str) -> Result<Self, DuckDbError>

Create a file-backed DuckDB instance at path with the default read pool size of 4 connections.

The file is created if it does not exist.

Source

pub fn persistent_with_pool( path: &str, read_pool_size: usize, ) -> Result<Self, DuckDbError>

Create a file-backed DuckDB instance at path with a custom read pool size.

read_pool_size is clamped to a minimum of 1.

Source

pub fn read_pool_size(&self) -> usize

Return the number of read connections in the pool.

Always at least 1 (pool size is clamped at construction time).

Trait Implementations§

Source§

impl OlapEngine for DuckDbEngine

Source§

async fn query(&self, sql: &str) -> Result<Vec<RecordBatch>, Self::Error>

Execute a read-only SQL statement and return results as Arrow RecordBatches.

Dispatches to a round-robin read-pool connection via tokio::task::spawn_blocking. Results are collected in full before returning (collect-then-stream fallback; use query_stream for large result sets once streaming is implemented).

Source§

async fn execute(&self, sql: &str) -> Result<u64, Self::Error>

Execute a write SQL statement (INSERT / UPDATE / DELETE / DDL) and return the number of rows affected.

Always dispatched to the dedicated write connection.

Source§

async fn load_arrow( &self, table: &str, batches: &[RecordBatch], ) -> Result<u64, Self::Error>

Bulk-ingest Arrow RecordBatches into table using DuckDB’s native Appender API.

§Zero-copy ingestion

This method calls Appender::append_record_batch directly. Because both the workspace (arrow 58) and DuckDB’s internal binding (arrow 57) share the same in-memory column layout, the conversion is a schema header reconstruction plus an Arc-clone of column arrays — no data is copied and no values are serialised to SQL literals.

§Supported Arrow types

All Arrow types that DuckDB’s Appender API accepts are preserved, including: Boolean, integer variants (Int8Int64, UInt8UInt64), Float32/Float64, Utf8/LargeUtf8, Binary/ LargeBinary, Date32/Date64, Timestamp (all time units), Decimal128, and fixed-size / nested types.

§Return value

Returns the total number of rows ingested across all batches. Returns 0 immediately when batches is empty.

Source§

async fn create_table( &self, table_name: &str, schema: &SchemaRef, primary_key: &[String], ) -> Result<(), Self::Error>

Create a new table from an Arrow schema, optionally with a composite primary key.

Issues CREATE TABLE IF NOT EXISTS so calling this method on an already-existing table is a no-op. All table and column identifiers are validated against [A-Za-z0-9_] before the DDL is sent to DuckDB.

Arrow types are mapped to DuckDB SQL types via the internal arrow_type_to_duckdb_sql helper; unknown types fall back to VARCHAR.

Source§

async fn table_exists(&self, table_name: &str) -> Result<bool, Self::Error>

Check whether a table with the given name exists in the DuckDB catalog.

Queries information_schema.tables via a read-pool connection.

Source§

async fn add_column( &self, table_name: &str, column_name: &str, data_type: &DataType, ) -> Result<(), Self::Error>

Add a new nullable column to an existing table.

Issues ALTER TABLE … ADD COLUMN. The column type is derived from data_type using the same Arrow-to-DuckDB type mapping used by create_table. Both table_name and column_name must match [A-Za-z0-9_].

Source§

fn supports_transactions(&self) -> bool

Returns true — DuckDB supports full ACID transactions.

The sync engine uses this to decide whether to wrap a CDC cycle in a BEGIN … COMMIT block. For DuckDB this is always safe; DataFusion treats transactions as no-ops.

Source§

async fn drop_column( &self, table_name: &str, column_name: &str, ) -> Result<(), Self::Error>

Drop a column from an existing table.

Issues ALTER TABLE … DROP COLUMN. Both identifiers are validated before use.

Source§

type Error = DuckDbError

Engine-specific error type returned by all fallible methods.
Source§

fn query_stream( &self, sql: &str, ) -> impl Future<Output = Result<Pin<Box<dyn Stream<Item = Result<RecordBatch, Box<dyn Error + Send + Sync>>> + Send>>, Self::Error>> + Send

Execute a SQL query and return a lazy RecordBatchBoxStream. Read more
Source§

impl Send for DuckDbEngine

Source§

impl Sync for DuckDbEngine

§Safety

See the unsafe impl Send block above. Shared references (&DuckDbEngine) only expose the read_idx atomic (inherently Sync) and Arc<Mutex<…>> fields (both Sync). No unsynchronised access to duckdb::Connection is possible through &DuckDbEngine.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,