pub struct DuckDbEngine { /* private fields */ }Expand description
DuckDB-backed OLAP engine with a single write connection and a concurrent read pool.
§Connection pool
| Connection | Count | Protected by | Used for |
|---|---|---|---|
write_conn | 1 | Mutex | DDL + DML (INSERT/UPDATE/DELETE/CREATE/ALTER) |
read_pool | N (default 4) | per-slot Mutex | Concurrent SELECT via round-robin |
All read-pool connections are obtained with Connection::try_clone() from
the initial write connection; they share the same underlying DuckDB database
and benefit from DuckDB’s MVCC for isolated concurrent reads.
§Dispatching
Every operation is dispatched to a tokio::task::spawn_blocking closure so
that blocking DuckDB calls never stall the async executor. The Arc<Mutex>
wrapper is cloned before the closure is spawned, keeping all lifetimes
'static.
§Arrow ingestion
DuckDbEngine::load_arrow uses DuckDB’s native Appender API
(Appender::append_record_batch) for zero-copy bulk loading. This preserves
all Arrow types without SQL literal serialization.
§Thread-safety
duckdb::Connection is !Send. The unsafe impl Send + Sync on this type
is sound because every connection is accessed only inside a Mutex
guard, and the Mutex enforces the single-thread-at-a-time invariant that
DuckDB’s C++ layer requires. See the crate-level safety note for
the full rationale.
Implementations§
Source§impl DuckDbEngine
impl DuckDbEngine
Sourcepub fn in_memory() -> Result<Self, DuckDbError>
pub fn in_memory() -> Result<Self, DuckDbError>
Create a pure in-memory DuckDB instance with the default read pool size of 4 connections.
In-memory databases are ephemeral: all data is lost when the engine is dropped. Useful for tests and short-lived analytical sessions.
Sourcepub fn in_memory_with_pool(read_pool_size: usize) -> Result<Self, DuckDbError>
pub fn in_memory_with_pool(read_pool_size: usize) -> Result<Self, DuckDbError>
Create a pure in-memory DuckDB instance with a custom read pool size.
read_pool_size is clamped to a minimum of 1 — a pool with zero readers
would deadlock on the first query.
Sourcepub fn persistent(path: &str) -> Result<Self, DuckDbError>
pub fn persistent(path: &str) -> Result<Self, DuckDbError>
Create a file-backed DuckDB instance at path with the default read
pool size of 4 connections.
The file is created if it does not exist.
Sourcepub fn persistent_with_pool(
path: &str,
read_pool_size: usize,
) -> Result<Self, DuckDbError>
pub fn persistent_with_pool( path: &str, read_pool_size: usize, ) -> Result<Self, DuckDbError>
Create a file-backed DuckDB instance at path with a custom read pool
size.
read_pool_size is clamped to a minimum of 1.
Sourcepub fn read_pool_size(&self) -> usize
pub fn read_pool_size(&self) -> usize
Return the number of read connections in the pool.
Always at least 1 (pool size is clamped at construction time).
Trait Implementations§
Source§impl OlapEngine for DuckDbEngine
impl OlapEngine for DuckDbEngine
Source§async fn query(&self, sql: &str) -> Result<Vec<RecordBatch>, Self::Error>
async fn query(&self, sql: &str) -> Result<Vec<RecordBatch>, Self::Error>
Execute a read-only SQL statement and return results as Arrow
RecordBatches.
Dispatches to a round-robin read-pool connection via
tokio::task::spawn_blocking. Results are collected in full before
returning (collect-then-stream fallback; use query_stream for large
result sets once streaming is implemented).
Source§async fn execute(&self, sql: &str) -> Result<u64, Self::Error>
async fn execute(&self, sql: &str) -> Result<u64, Self::Error>
Execute a write SQL statement (INSERT / UPDATE / DELETE / DDL) and return the number of rows affected.
Always dispatched to the dedicated write connection.
Source§async fn load_arrow(
&self,
table: &str,
batches: &[RecordBatch],
) -> Result<u64, Self::Error>
async fn load_arrow( &self, table: &str, batches: &[RecordBatch], ) -> Result<u64, Self::Error>
Bulk-ingest Arrow RecordBatches into table using DuckDB’s native
Appender API.
§Zero-copy ingestion
This method calls Appender::append_record_batch directly. Because both
the workspace (arrow 58) and DuckDB’s internal binding (arrow 57)
share the same in-memory column layout, the conversion is a schema
header reconstruction plus an Arc-clone of column arrays — no data is
copied and no values are serialised to SQL literals.
§Supported Arrow types
All Arrow types that DuckDB’s Appender API accepts are preserved,
including: Boolean, integer variants (Int8–Int64, UInt8–
UInt64), Float32/Float64, Utf8/LargeUtf8, Binary/
LargeBinary, Date32/Date64, Timestamp (all time units),
Decimal128, and fixed-size / nested types.
§Return value
Returns the total number of rows ingested across all batches. Returns
0 immediately when batches is empty.
Source§async fn create_table(
&self,
table_name: &str,
schema: &SchemaRef,
primary_key: &[String],
) -> Result<(), Self::Error>
async fn create_table( &self, table_name: &str, schema: &SchemaRef, primary_key: &[String], ) -> Result<(), Self::Error>
Create a new table from an Arrow schema, optionally with a composite primary key.
Issues CREATE TABLE IF NOT EXISTS so calling this method on an
already-existing table is a no-op. All table and column identifiers are
validated against [A-Za-z0-9_] before the DDL is sent to DuckDB.
Arrow types are mapped to DuckDB SQL types via the internal
arrow_type_to_duckdb_sql helper; unknown types fall back to VARCHAR.
Source§async fn table_exists(&self, table_name: &str) -> Result<bool, Self::Error>
async fn table_exists(&self, table_name: &str) -> Result<bool, Self::Error>
Check whether a table with the given name exists in the DuckDB catalog.
Queries information_schema.tables via a read-pool connection.
Source§async fn add_column(
&self,
table_name: &str,
column_name: &str,
data_type: &DataType,
) -> Result<(), Self::Error>
async fn add_column( &self, table_name: &str, column_name: &str, data_type: &DataType, ) -> Result<(), Self::Error>
Add a new nullable column to an existing table.
Issues ALTER TABLE … ADD COLUMN. The column type is derived from
data_type using the same Arrow-to-DuckDB type mapping used by
create_table. Both table_name and column_name must match
[A-Za-z0-9_].
Source§fn supports_transactions(&self) -> bool
fn supports_transactions(&self) -> bool
Returns true — DuckDB supports full ACID transactions.
The sync engine uses this to decide whether to wrap a CDC cycle in a
BEGIN … COMMIT block. For DuckDB this is always safe; DataFusion
treats transactions as no-ops.
Source§async fn drop_column(
&self,
table_name: &str,
column_name: &str,
) -> Result<(), Self::Error>
async fn drop_column( &self, table_name: &str, column_name: &str, ) -> Result<(), Self::Error>
Drop a column from an existing table.
Issues ALTER TABLE … DROP COLUMN. Both identifiers are validated
before use.
Source§type Error = DuckDbError
type Error = DuckDbError
impl Send for DuckDbEngine
impl Sync for DuckDbEngine
§Safety
See the unsafe impl Send block above. Shared references (&DuckDbEngine)
only expose the read_idx atomic (inherently Sync) and Arc<Mutex<…>>
fields (both Sync). No unsynchronised access to duckdb::Connection is
possible through &DuckDbEngine.