Struct Database

Source

pub struct Database { /* private fields */ }

Expand description

The SochDB Database Kernel

This is the shared core used by both embedded (SochConnection) and server (sochdb-server) modes. It owns all storage, catalog, and indexing components.

§Thread Safety

The Database is fully thread-safe via internal synchronization:

Multiple readers can operate concurrently (MVCC snapshots)
Writers coordinate through WAL and group commit
All state is behind Arc/RwLock for shared access

§Concurrency Modes

§Standard Mode (Single Process)

Uses exclusive file lock (flock(LOCK_EX))
Best for: Scripts, notebooks, CLI tools
Open with: Database::open(path)

§Concurrent Mode (Multi-Process/Web Apps)

Uses lock-free MVCC for reads, single-writer coordination for writes
Best for: Web servers, Flask/FastAPI apps, hot reloading
Open with: Database::open_concurrent(path)

§Example

// Standard mode (single process)
let db = Database::open("./my_data")?;

// Concurrent mode (multi-reader, single-writer)
let db = Database::open_concurrent("./my_data")?;

// Begin a transaction
let txn = db.begin_transaction()?;

// Write data
db.put(txn, b"user:1:name", b"Alice")?;

// Commit
db.commit(txn)?;

Implementations§

Source §

impl Database

Source

pub const MIN_SCAN_PREFIX_LEN: usize = 2

Minimum prefix length for scan operations. Prevents expensive full-table scans by requiring a meaningful prefix.

Source

pub fn open<P: AsRef<Path>>(path: P) -> Result<Arc<Self>>

Open or create a database at the given path.

This is the primary entry point, similar to sqlite3_open(). If the database exists, it will be opened and WAL recovery performed. If it doesn’t exist, a new database will be created.

§Arguments

path - Directory path for the database files

§Returns

An Arc<Database> that can be shared across threads and connections.

Source

pub fn open_with_config<P: AsRef<Path>>( path: P, config: DatabaseConfig, ) -> Result<Arc<Self>>

Open with custom configuration

Source

pub fn open_concurrent<P: AsRef<Path>>(path: P) -> Result<Arc<Self>>

Open database in concurrent mode (multi-reader, single-writer)

This mode allows multiple processes to access the database simultaneously:

Readers: Lock-free, concurrent access via MVCC snapshots
Writers: Single-writer coordination through atomic locks

§Use Cases

Web applications (Flask, FastAPI, Django)
Hot reloading development servers
Multi-process worker pools
Any scenario with concurrent read access

§Performance

Read latency: ~100ns (lock-free atomic operations)
Write latency: ~60μs amortized (with group commit)
Concurrent readers: Up to 1024 (configurable)

§Example

// Multiple processes can open the same database
let db = Database::open_concurrent("./my_data")?;

// Reads are lock-free
let value = db.get(b"key")?;

// Writes coordinate automatically
let txn = db.begin_transaction()?;
db.put(txn, b"key", b"value")?;
db.commit(txn)?;

Source

pub fn open_concurrent_with_config<P: AsRef<Path>>( path: P, config: DatabaseConfig, ) -> Result<Arc<Self>>

Open database in concurrent mode with custom configuration

Source

pub fn is_concurrent(&self) -> bool

Check if database is in concurrent mode

Source

pub fn path(&self) -> &Path

Get database path

Source

pub fn begin_transaction(&self) -> Result<TxnHandle>

Begin a new transaction

Source

pub fn begin_read_only(&self) -> Result<TxnHandle>

Begin a read-only transaction (optimized: no SSI tracking)

Read-only transactions skip SSI read tracking, reducing overhead from ~82ns to ~32ns per read (2.6x faster).

Use this for:

SELECT queries that don’t modify data
Analytics and reporting queries
Snapshot reads for backup

Source

pub fn begin_read_only_fast(&self) -> TxnHandle

Begin a lightweight read-only transaction (no WAL overhead).

Eliminates WAL mutex acquisitions entirely for read operations. The txn_id is allocated atomically and MVCC snapshot state is created, but NO WAL records are written (no TxnBegin, no TxnAbort).

~5-10x faster per-operation than begin_read_only() because it avoids:

2 WAL mutex lock/unlock cycles per transaction
2 WAL BufWriter serializations per transaction

Callers MUST use abort_read_only_fast() to clean up — NOT commit() or abort().

Source

pub fn abort_read_only_fast(&self, txn: TxnHandle)

Abort a fast read-only transaction — O(1), no WAL, no memtable scan.

Source

pub fn get_raw_read(&self, key: &[u8]) -> Option<Vec<u8>>

Read a key WITHOUT any MVCC transaction tracking.

Uses the current global timestamp to see all committed writes. Bypasses: begin/abort, active_txns DashMap, record_read, stats. Only safe for single-threaded access with no concurrent writers.

Source

pub fn scan_raw(&self, prefix: &[u8]) -> Vec<(Vec<u8>, Vec<u8>)>

Scan by prefix WITHOUT any MVCC transaction tracking.

Uses the current global timestamp. Only safe for single-threaded access.

Source

pub fn begin_write_only(&self) -> Result<TxnHandle>

Begin a write-only transaction (optimized: no read tracking)

Write-only transactions skip read tracking, improving insert throughput for bulk loading scenarios.

Use this for:

Bulk data imports
Append-only logging tables
ETL pipelines

Source

pub fn commit(&self, txn: TxnHandle) -> Result<u64>

Commit a transaction

In concurrent mode, acquires the shared writer lock to ensure WAL writes are serialized across processes, and forces a flush+sync so that subsequent processes see the committed data.

Source

pub fn abort(&self, txn: TxnHandle) -> Result<()>

Abort a transaction

Source

pub fn set_table_index_policy(&self, table: &str, policy: IndexPolicy)

Configure index policy for a table

This allows fine-grained control over write/scan trade-offs per table:

Policy	Insert Cost	Scan Cost	Use Case
WriteOptimized	O(1)	O(N)	High-write, rare scan
Balanced	O(1) amort	O(output+logK)	Mixed OLTP
ScanOptimized	O(log N)	O(logN + K)	Analytics, range query
AppendOnly	O(1)	O(N)	Time-series logs

§Example

// Fast inserts for logs table (no ordered index overhead)
db.set_table_index_policy("logs", IndexPolicy::WriteOptimized);

// Efficient range scans for analytics table
db.set_table_index_policy("analytics", IndexPolicy::ScanOptimized);

// Balanced for OLTP tables
db.set_table_index_policy("users", IndexPolicy::Balanced);

Source

pub fn get_table_index_policy(&self, table: &str) -> IndexPolicy

Get the index policy for a table

Source

pub fn index_registry(&self) -> &Arc<TableIndexRegistry>

Get the index registry for advanced configuration

Source

pub fn put(&self, txn: TxnHandle, key: &[u8], value: &[u8]) -> Result<()>

Put a key-value pair

In concurrent mode, acquires the shared writer lock to ensure WAL writes are serialized across processes.

Source

pub fn put_batch(&self, txn: TxnHandle, writes: &[(&[u8], &[u8])]) -> Result<()>

Batch put multiple key-value pairs with reduced overhead

This amortizes per-operation costs over the entire batch:

Single DashMap lookup
Batch MVCC tracking
Batch memtable writes

For 100+ entries, this is 2-3x faster than individual puts.

§Example

let writes: Vec<(&[u8], &[u8])> = vec![
    (b"key1", b"value1"),
    (b"key2", b"value2"),
    (b"key3", b"value3"),
];
db.put_batch(txn, &writes)?;

Source

pub fn get(&self, txn: TxnHandle, key: &[u8]) -> Result<Option<Vec<u8>>>

Get a value by key

Source

pub fn delete(&self, txn: TxnHandle, key: &[u8]) -> Result<()>

Delete a key

Source

pub fn scan( &self, txn: TxnHandle, prefix: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys with a prefix (enforces minimum prefix length for safety).

§Prefix Safety

To prevent accidental full-table scans, this method requires a minimum prefix length of 2 bytes. Use scan_unchecked for internal operations that need empty/short prefixes.

§Errors

Returns SochDBError::InvalidInput if prefix is too short.

Source

pub fn scan_unchecked( &self, txn: TxnHandle, prefix: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys with a prefix without length validation.

§Warning

This method allows empty/short prefixes which can cause expensive full-table scans. Use scan() unless you specifically need unrestricted prefix access for internal operations.

Source

pub fn scan_range( &self, txn: TxnHandle, start: &[u8], end: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys in range

Source

pub fn scan_range_iter<'a>( &'a self, txn: TxnHandle, start: &'a [u8], end: &'a [u8], ) -> impl Iterator<Item = Result<(Vec<u8>, Vec<u8>)>> + 'a

Streaming scan for very large result sets

Returns an iterator that yields (key, value) pairs without materializing the entire result set. Use this for large scans where memory efficiency is important.

§Performance

Memory: O(1) per iteration vs O(N) for scan_range
Latency: First result available immediately vs waiting for all results
Throughput: Slightly lower due to per-item overhead

§Usage

for result in db.scan_range_iter(txn, b"start", b"end") {
    let (key, value) = result?;
    // Process immediately - no need to wait for all results
}

Source

pub fn flush(&self) -> Result<()>

Flush memtable to WAL/Disk

Source

pub fn storage_stats(&self) -> StorageStats

Get storage statistics

Source

pub fn put_path(&self, txn: TxnHandle, path: &str, value: &[u8]) -> Result<()>

Put a value at a path

Path format: “collection/doc_id/field” or “table.row_id.column” Resolution is O(|path|), not O(log N) like B-tree.

Source

pub fn get_path(&self, txn: TxnHandle, path: &str) -> Result<Option<Vec<u8>>>

Get a value at a path

Source

pub fn delete_path(&self, txn: TxnHandle, path: &str) -> Result<()>

Delete at a path

Source

pub fn scan_path( &self, txn: TxnHandle, prefix: &str, ) -> Result<Vec<(String, Vec<u8>)>>

Scan a path prefix

Returns all key-value pairs where key starts with prefix. Useful for: “users/123/” -> all fields of user 123

Source

pub fn query(&self, txn: TxnHandle, path_prefix: &str) -> QueryBuilder<'_>

Execute a path query and return results

This is the main query interface for LLM context retrieval. Supports:

Path prefix matching
Column projection (for I/O reduction)
Limit/offset

Source

pub fn register_table(&self, schema: TableSchema) -> Result<()>

Register a table schema

Source

pub fn get_table_schema(&self, name: &str) -> Option<TableSchema>

Get table schema

Source

pub fn update_table_schema( &self, old_name: &str, schema: TableSchema, ) -> Result<()>

Update the schema for an existing table (used by ALTER TABLE).

Replaces the schema in both the tables DashMap and the packed schema cache atomically (per-key). The caller is responsible for validating the new schema.

Source

pub fn list_tables(&self) -> Vec<String>

List all tables

Source

pub fn enable_cdc(&mut self, config: CdcConfig) -> Arc<CdcLog>

Enable CDC on this database, returning the CDC log handle.

Subsequent mutations emitted via the SQL execution layer will be recorded in the CDC log for subscriber consumption.

Source

pub fn cdc_log(&self) -> Option<&Arc<CdcLog>>

Get the CDC log handle, if CDC is enabled.

Source

pub fn insert_row( &self, txn: TxnHandle, table: &str, row_id: u64, values: &HashMap<String, SochValue>, ) -> Result<()>

Insert a row into a table

Uses packed row format: stores entire row as single key-value pair. This reduces write amplification from 4× to 1× for a 4-column table.

§Performance

Before: 4 columns × (WAL entry + MVCC version) = 4 writes
After: 1 packed row = 1 write
Improvement: ~4× fewer WAL entries, ~48% less I/O overhead

Source

pub fn read_row( &self, txn: TxnHandle, table: &str, row_id: u64, columns: Option<&[&str]>, ) -> Result<Option<HashMap<String, SochValue>>>

Read a row from a table

Reads packed row and extracts requested columns in O(k) time. Column projection happens in memory, not storage - all columns are fetched.

Source

pub fn insert_rows_batch( &self, txn: TxnHandle, table: &str, rows: &[(u64, HashMap<String, SochValue>)], ) -> Result<usize>

Insert multiple rows efficiently in a batch

This method accumulates all rows and writes them with fewer WAL syncs. Ideal for bulk loading scenarios.

§Performance

Uses group commit to batch fsync operations
Expected throughput: 500K-1M rows/sec depending on row size

Source

pub fn put_raw(&self, txn: TxnHandle, key: &[u8], value: &[u8]) -> Result<()>

Ultra-fast raw put - bypasses all validation

Use when you’ve already validated the data and just need speed. This is ~10× faster than insert_row() for bulk inserts.

Source

pub fn insert_row_slice( &self, txn: TxnHandle, table: &str, row_id: u64, values: &[Option<&SochValue>], ) -> Result<()>

Zero-allocation insert - fastest path for bulk inserts

Takes values as a slice in schema column order, avoiding HashMap overhead.

§Arguments

txn - Transaction handle
table - Table name
row_id - Row identifier
values - Values in schema column order (None = NULL)

§Performance

Eliminates ~6 allocations per row vs insert_row()
Expected: 1.2M-1.5M inserts/sec

§Example

let values: &[Option<&SochValue>] = &[
    Some(&SochValue::Int(1)),
    Some(&SochValue::Text("Alice".into())),
    None, // NULL
];
db.insert_row_slice(txn, "users", 1, values)?;

Source

pub fn fsync(&self) -> Result<()>

Force fsync to disk

Source

pub fn checkpoint(&self) -> Result<u64>

Create a checkpoint

Source

pub fn truncate_wal(&self) -> Result<()>

Truncate the WAL file after a checkpoint.

See DurableStorage::truncate_wal for safety notes.

Source

pub fn gc(&self) -> usize

Run garbage collection

Source

pub fn stats(&self) -> Stats

Get database statistics

Source

pub fn shutdown(&self) -> Result<()>

Shutdown the database gracefully

Trait Implementations§

Source §

impl Drop for Database

Source §

fn drop(&mut self)

Executes the destructor for this type. Read more

Source §

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)

Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more

Auto Trait Implementations§

§

impl UnsafeUnpin for Database

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §