Skip to main content

Executor

Struct Executor 

Source
pub struct Executor { /* private fields */ }
Expand description

SQL Query Executor

The executor is the main entry point for executing SQL statements. It coordinates between the parser, storage engine, and function registry.

Implementations§

Source§

impl Executor

Source

pub fn try_storage_aggregation( &self, table: &dyn Table, stmt: &SelectStatement, all_columns: &[String], classification: &QueryClassification, ) -> Option<Box<dyn QueryResult>>

Try to use storage-level aggregation for GROUP BY queries.

This optimization bypasses row materialization by computing aggregates directly from arena storage using Arc::clone for group keys.

Returns None if the optimization cannot be applied.

Currently only applies to simple queries with:

  • GROUP BY columns that match SELECT identifiers exactly (same order)
  • Simple aggregates (COUNT, SUM, AVG, MIN, MAX) on column references
  • No WHERE, HAVING, ROLLUP, CUBE, or GROUPING SETS
Source§

impl Executor

Source

pub fn try_extract_semi_join_info( exists: &ExistsExpression, is_negated: bool, outer_tables: &[String], ) -> Option<SemiJoinInfo>

Try to extract semi-join information from a correlated EXISTS subquery.

For semi-join optimization, we need:

  1. A simple table source (no joins in subquery)
  2. A WHERE clause with inner.col = outer.col equality
  3. Optional additional non-correlated predicates

Returns None if the subquery cannot be optimized as a semi-join.

Source

pub fn should_use_index_nested_loop_for_anti_join( &self, _info: &SemiJoinInfo, outer_limit: Option<i64>, ) -> bool

Check if index-nested-loop should be preferred over anti-join for NOT EXISTS.

For NOT EXISTS, anti-join using HashJoinOperator is almost always more efficient than both index-nested-loop and InHashSet because:

  1. HashJoinOperator does bulk hash table build/probe (cache-efficient)
  2. No per-row expression evaluation overhead
  3. Even with LIMIT, the bulk operation is faster than per-row checking

The only case where we might prefer index-nested-loop is for VERY small LIMIT (e.g., LIMIT 10) with a highly selective index, but benchmarks show hash join is still faster in most cases.

Source

pub fn execute_semi_join_optimization( &self, info: &SemiJoinInfo, ctx: &ExecutionContext, ) -> Result<CompactArc<ValueSet>>

Execute the semi-join optimization for an EXISTS subquery.

Instead of executing the subquery for each outer row, we:

  1. Execute the inner query once with non-correlated predicates
  2. Collect all distinct values of the inner correlation column
  3. Return an FxHashSet for fast O(1) lookups

Results are cached to avoid re-execution for the same query within a single top-level query execution.

Source

pub fn execute_anti_join( &self, info: &SemiJoinInfo, outer_rows: CompactArc<Vec<Row>>, outer_columns: &[String], _ctx: &ExecutionContext, ) -> Result<RowVec>

Execute NOT EXISTS as a true anti-join using HashJoinOperator.

This is more efficient than the InHashSet approach because:

  1. HashJoinOperator builds hash table once and probes in bulk
  2. No per-row expression evaluation overhead
  3. Better cache efficiency due to batch processing
  4. Direct table access without going through full query pipeline
§Arguments
  • info - SemiJoinInfo extracted from the NOT EXISTS subquery
  • outer_rows - Pre-materialized outer table rows
  • outer_columns - Column names for outer table
  • _ctx - Execution context (not used but kept for API consistency)
§Returns

Rows from outer table that have NO match in inner table (anti-join result)

Source

pub fn try_extract_not_exists_info( expr: &Expression, outer_tables: &[String], ) -> Option<SemiJoinInfo>

Try to extract SemiJoinInfo from a NOT EXISTS expression. Returns None if the expression is not a valid NOT EXISTS pattern.

Source

pub fn transform_exists_to_in_list( info: &SemiJoinInfo, hash_set: CompactArc<ValueSet>, ) -> Expression

Transform a WHERE clause with EXISTS into one using a pre-computed hash set.

Replaces: EXISTS (SELECT …) with: outer_col IN (hash_set_values)

Source

pub fn try_optimize_exists_to_semi_join( &self, expr: &Expression, ctx: &ExecutionContext, outer_tables: &[String], outer_limit: Option<i64>, ) -> Result<Option<Expression>>

Try to optimize correlated EXISTS subqueries to semi-join. Returns Some(optimized_expression) if successful, None if not applicable.

Note: This function now checks if index-nested-loop would be more efficient and skips the semi-join transformation in that case, allowing per-row index probing.

The outer_limit parameter helps decide between strategies:

  • With small LIMIT + index: prefer index-nested-loop (per-row probing with early termination)
  • Without LIMIT: prefer semi-join (scan inner once, hash lookup per outer row)
Source

pub fn try_optimize_in_to_semi_join( &self, expr: &Expression, ctx: &ExecutionContext, outer_tables: &[String], ) -> Result<Option<Expression>>

Try to optimize IN subqueries to semi-join (execute once, hash lookup per row).

This transforms:

WHERE outer.col IN (SELECT inner_col FROM t WHERE non_correlated_pred)

Into:

WHERE outer.col IN (hash_set_of_inner_col_values)
§Optimization Criteria
  1. IN right side must be a scalar subquery
  2. Subquery must SELECT exactly one column
  3. Subquery must have a simple table source (no joins)
  4. Subquery WHERE clause must NOT reference outer tables (non-correlated)
§Performance Impact
  • Before: O(N×M) - executes subquery for each outer row
  • After: O(N+M) - executes subquery once, O(1) hash lookup per row
Source

pub fn collect_outer_table_names( table_expr: &Option<Box<Expression>>, ) -> Vec<String>

Get outer table names from a table expression (for semi-join optimization).

Source§

impl Executor

Source

pub fn execute_select_with_window_functions_lazy_partition( &self, stmt: &SelectStatement, ctx: &ExecutionContext, table: &dyn Table, base_columns: &[String], partition_col: &str, limit: usize, ) -> Result<Box<dyn QueryResult>>

Lazy partition fetching for window functions with LIMIT pushdown Fetches partitions one at a time from the index and stops when LIMIT is reached This is the key optimization for PARTITION BY + LIMIT queries

Source§

impl Executor

Source

pub fn new(engine: Arc<MVCCEngine>) -> Self

Create a new executor with the given storage engine

Source

pub fn with_function_registry( engine: Arc<MVCCEngine>, function_registry: Arc<FunctionRegistry>, ) -> Self

Create a new executor with a custom function registry

Source

pub fn with_cache_size(engine: Arc<MVCCEngine>, cache_size: usize) -> Self

Create a new executor with a custom cache size

Source

pub fn has_active_transaction(&self) -> bool

Check if there is an active explicit transaction

Source

pub fn set_default_isolation_level(&mut self, level: IsolationLevel)

Set the default isolation level for new transactions

Source

pub fn engine(&self) -> &Arc<MVCCEngine>

Get the storage engine

Source

pub fn function_registry(&self) -> &Arc<FunctionRegistry>

Get the function registry

Source

pub fn execute(&self, sql: &str) -> Result<Box<dyn QueryResult>>

Execute a SQL query string

This is the main entry point for executing SQL statements. It parses the query and executes each statement in order. Uses the query cache to avoid re-parsing identical queries.

Source

pub fn execute_with_params( &self, sql: &str, params: ParamVec, ) -> Result<Box<dyn QueryResult>>

Execute a SQL query with positional parameters

Parameters are substituted for $1, $2, etc. placeholders in the query. Uses the query cache for efficient re-execution of parameterized queries. Note: Callers should try try_fast_path_with_params() first before calling this.

Source

pub fn try_fast_path_with_params( &self, sql: &str, params: &[Value], ) -> Option<Result<Box<dyn QueryResult>>>

Try fast path execution with borrowed params slice Returns None if fast path doesn’t apply, Some(result) otherwise

Source

pub fn execute_with_named_params( &self, sql: &str, params: FxHashMap<String, Value>, ) -> Result<Box<dyn QueryResult>>

Execute a SQL query with named parameters

Parameters are substituted for :name placeholders in the query. Uses the query cache for efficient re-execution of parameterized queries.

Source

pub fn execute_with_context( &self, sql: &str, ctx: &ExecutionContext, ) -> Result<Box<dyn QueryResult>>

Execute a SQL query with a full execution context Uses the query cache for efficient re-execution.

Source

pub fn query_cache(&self) -> &QueryCache

Get the query cache

Source

pub fn cache_stats(&self) -> CacheStats

Get query cache statistics

Source

pub fn clear_cache(&self)

Clear the query cache

Source

pub fn semantic_cache(&self) -> &SemanticCache

Get the semantic cache

Source

pub fn semantic_cache_stats(&self) -> SemanticCacheStatsSnapshot

Get semantic cache statistics

Source

pub fn clear_semantic_cache(&self)

Clear the semantic cache

Source

pub fn invalidate_semantic_cache(&self, table_name: &str)

Invalidate semantic cache for a specific table

Call this after INSERT, UPDATE, DELETE, or TRUNCATE on a table.

Source

pub fn execute_program(&self, program: &Program) -> Result<Box<dyn QueryResult>>

Execute a parsed program

Source

pub fn execute_program_with_context( &self, program: &Program, ctx: &ExecutionContext, ) -> Result<Box<dyn QueryResult>>

Execute a parsed program with context

Source

pub fn execute_statement( &self, statement: &Statement, ctx: &ExecutionContext, ) -> Result<Box<dyn QueryResult>>

Execute a single statement

Source

pub fn install_transaction(&self, tx: Box<dyn Transaction>)

Install an external storage transaction as the active transaction.

Used by the programmatic Transaction API to delegate SELECT queries to the full executor pipeline (aggregates, JOINs, window functions, etc.) while keeping the transaction’s uncommitted changes visible.

Source

pub fn take_transaction(&self) -> Option<Box<dyn Transaction>>

Take back the storage transaction from the active transaction slot.

Returns the transaction so the caller can continue using it for further DML operations after the SELECT delegation completes.

Source

pub fn begin_transaction(&self) -> Result<Box<dyn Transaction>>

Begin a new transaction

Source

pub fn begin_transaction_with_isolation( &self, isolation: IsolationLevel, ) -> Result<Box<dyn Transaction>>

Begin a new transaction with a specific isolation level

Source

pub fn get_or_create_plan(&self, sql: &str) -> Result<CachedPlanRef>

Get or create a cached plan for a SQL statement.

Parses the SQL and caches the plan if not already cached. Returns a lightweight CachedPlanRef that can be stored and reused for repeated execution without re-parsing or cache lookup overhead.

Source

pub fn execute_with_cached_plan( &self, plan: &CachedPlanRef, ctx: &ExecutionContext, ) -> Result<Box<dyn QueryResult>>

Execute a pre-cached plan directly, skipping cache lookup.

This is the fast path for prepared statements: the caller holds a CachedPlanRef obtained from get_or_create_plan() and passes it here on every execution, avoiding normalize + hash + RwLock read per call.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CompactArcDrop for T

Source§

unsafe fn drop_and_dealloc(ptr: *mut u8)

Drop the contained data and deallocate the header+data allocation. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V