Expand description
Memory management utilities for efficient query execution
This module provides memory-bounded execution for SQL query operators, enabling processing of datasets larger than available memory through disk spilling.
§Components
- Memory Controller (
MemoryController): Budget management and tracking - Memory Reservation (
MemoryReservation): Per-operator memory tracking - External Sort (
ExternalSort): Disk-spilling merge sort - External Aggregate (
ExternalAggregate): Partition-based GROUP BY - External Hash Join (
ExternalHashJoin): Grace hash join with spilling - Spill Files (
SpillFile): Temporary file management with auto-cleanup - Arena Allocator (
QueryArena): Fast bump-pointer allocator
§Architecture
┌─────────────────────────────────────────────────────────────────┐
│ MemoryController │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Budget Pool │ │ Tracking │ │ Metrics │ │
│ │ (configurable)│ │ (per-operator)│ │ (spills, peak, etc.) │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ External │ │ External │ │ External │
│ Sort │ │ Aggregate │ │ Hash Join │
│ (merge sort) │ │ (partitioned)│ │ (grace join) │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ SpillFile (temp files) │
│ Auto-cleanup on drop, buffered I/O, seeking │
└─────────────────────────────────────────────────────────────────┘§Memory-Bounded Execution
use std::sync::Arc;
use vibesql_executor::memory::{MemoryController, MemoryConfig};
// Create controller with 1GB budget
let controller = Arc::new(MemoryController::with_budget(1024 * 1024 * 1024));
// Operators create reservations to track their memory
let mut reservation = controller.create_reservation();
// When memory is exhausted, spill to disk
if !reservation.try_grow(batch_size) {
spill_to_disk(&data);
reservation.shrink(data.size());
}
// Check statistics after execution
let stats = controller.stats();
println!("{}", stats); // "Memory: 512MB/1GB (50%), peak: 950MB, spilled: 2GB (3 ops)"§External Operators
§External Sort
Two-phase external merge sort:
- Run generation: Sort in-memory chunks, spill as sorted runs
- K-way merge: Merge runs using a tournament tree
let mut sort = ExternalSort::new(controller, config, sort_keys);
for row in input {
sort.add_row(&row)?; // Automatically spills when needed
}
for result in sort.finish()? {
// Rows come out in sorted order
}§External Aggregate
Partition-based aggregation for GROUP BY:
- Hash rows to partitions
- Spill partitions when memory exhausted
- Process each partition’s groups
let specs = vec![AggregateSpec { function_name: "SUM".into(), .. }];
let mut agg = ExternalAggregate::new(controller, config, specs, 2);
for row in input {
agg.add_row(&row)?;
}
for result in agg.finish()? {
// (group_key..., aggregate_values...)
}§External Hash Join
Grace hash join with partition-based spilling:
- Partition both build and probe sides by join key hash
- Spill partitions when memory exhausted
- Process matching partitions together
let mut join = ExternalHashJoin::new(
controller, config,
vec![0], // build key columns
vec![0], // probe key columns
JoinType::Inner,
);
for row in build_side { join.add_build_row(&row)?; }
for row in probe_side { join.add_probe_row(&row)?; }
for result in join.finish()? {
// Joined rows
}§Configuration
Environment variables:
| Variable | Description | Default |
|---|---|---|
VIBESQL_MEMORY_LIMIT | Total memory budget (e.g., “4GB”) | 1GB |
VIBESQL_TEMP_DIR | Directory for spill files | system temp |
VIBESQL_SPILL_THRESHOLD | When to start spilling (0.0-1.0) | 0.8 |
VIBESQL_PARTITION_SIZE | Target partition size | 64MB |
Modules§
- row_
serialization - Row serialization for disk spilling
Structs§
- Aggregate
Result Iterator - Iterator over aggregate results
- Aggregate
Spec - Specification for an aggregate function
- External
Aggregate - External aggregate operator
- External
Aggregate Config - Configuration for external aggregate
- External
Hash Join - External Hash Join operator
- External
Hash Join Config - Configuration for external hash join
- External
Sort - External sort operator
- External
Sort Config - Configuration for external sort
- Hash
Join Result Iterator - Iterator over hash join results
- Memory
Config - Configuration for memory-bounded execution
- Memory
Controller - Global memory controller for query execution
- Memory
Reservation - A memory reservation for a single operator
- Memory
Stats - Statistics snapshot from the memory controller
- Query
Arena - Spill
File - A handle to a temporary spill file
- Spill
File Set - A collection of spill files for managing multiple sorted runs
Enums§
- Join
Type - Join type for the external hash join
- Sorted
Iterator - Iterator over sorted results
Constants§
- DEFAULT_
MEMORY_ BUDGET - Default memory budget: 1GB Conservative default that works on most systems
- DEFAULT_
SPILL_ THRESHOLD - Default spill threshold: 80% Start spilling when 80% of budget is used
- DEFAULT_
TARGET_ PARTITION_ BYTES - Default target partition size for external operators: 64MB Tuned for good I/O efficiency while limiting memory per partition
- MIN_
OPERATOR_ MEMORY - Minimum memory for an operator: 4MB Below this, operators may not function correctly
Type Aliases§
- SortKey
- Sort key for a row: the evaluated ORDER BY values with their directions