Skip to main content

Database

Struct Database 

Source
pub struct Database { /* private fields */ }
Expand description

The SochDB Database Kernel

This is the shared core used by both embedded (SochConnection) and server (sochdb-server) modes. It owns all storage, catalog, and indexing components.

§Thread Safety

The Database is fully thread-safe via internal synchronization:

  • Multiple readers can operate concurrently (MVCC snapshots)
  • Writers coordinate through WAL and group commit
  • All state is behind Arc/RwLock for shared access

§Concurrency Modes

§Standard Mode (Single Process)

  • Uses exclusive file lock (flock(LOCK_EX))
  • Best for: Scripts, notebooks, CLI tools
  • Open with: Database::open(path)

§Concurrent Mode (Multi-Process/Web Apps)

  • Uses lock-free MVCC for reads, single-writer coordination for writes
  • Best for: Web servers, Flask/FastAPI apps, hot reloading
  • Open with: Database::open_concurrent(path)

§Example

// Standard mode (single process)
let db = Database::open("./my_data")?;

// Concurrent mode (multi-reader, single-writer)
let db = Database::open_concurrent("./my_data")?;

// Begin a transaction
let txn = db.begin_transaction()?;

// Write data
db.put(txn, b"user:1:name", b"Alice")?;

// Commit
db.commit(txn)?;

Implementations§

Source§

impl Database

Source

pub const MIN_SCAN_PREFIX_LEN: usize = 2

Minimum prefix length for scan operations. Prevents expensive full-table scans by requiring a meaningful prefix.

Source

pub fn open<P: AsRef<Path>>(path: P) -> Result<Arc<Self>>

Open or create a database at the given path.

This is the primary entry point, similar to sqlite3_open(). If the database exists, it will be opened and WAL recovery performed. If it doesn’t exist, a new database will be created.

§Arguments
  • path - Directory path for the database files
§Returns

An Arc<Database> that can be shared across threads and connections.

Source

pub fn open_with_config<P: AsRef<Path>>( path: P, config: DatabaseConfig, ) -> Result<Arc<Self>>

Open with custom configuration

Source

pub fn open_concurrent<P: AsRef<Path>>(path: P) -> Result<Arc<Self>>

Open database in concurrent mode (multi-reader, single-writer)

This mode allows multiple processes to access the database simultaneously:

  • Readers: Lock-free, concurrent access via MVCC snapshots
  • Writers: Single-writer coordination through atomic locks
§Use Cases
  • Web applications (Flask, FastAPI, Django)
  • Hot reloading development servers
  • Multi-process worker pools
  • Any scenario with concurrent read access
§Performance
  • Read latency: ~100ns (lock-free atomic operations)
  • Write latency: ~60μs amortized (with group commit)
  • Concurrent readers: Up to 1024 (configurable)
§Example
// Multiple processes can open the same database
let db = Database::open_concurrent("./my_data")?;

// Reads are lock-free
let value = db.get(b"key")?;

// Writes coordinate automatically
let txn = db.begin_transaction()?;
db.put(txn, b"key", b"value")?;
db.commit(txn)?;
Source

pub fn open_concurrent_with_config<P: AsRef<Path>>( path: P, config: DatabaseConfig, ) -> Result<Arc<Self>>

Open database in concurrent mode with custom configuration

Source

pub fn is_concurrent(&self) -> bool

Check if database is in concurrent mode

Source

pub fn path(&self) -> &Path

Get database path

Source

pub fn begin_transaction(&self) -> Result<TxnHandle>

Begin a new transaction

Source

pub fn begin_read_only(&self) -> Result<TxnHandle>

Begin a read-only transaction (optimized: no SSI tracking)

Read-only transactions skip SSI read tracking, reducing overhead from ~82ns to ~32ns per read (2.6x faster).

Use this for:

  • SELECT queries that don’t modify data
  • Analytics and reporting queries
  • Snapshot reads for backup
Source

pub fn begin_read_only_fast(&self) -> TxnHandle

Begin a lightweight read-only transaction (no WAL overhead).

Eliminates WAL mutex acquisitions entirely for read operations. The txn_id is allocated atomically and MVCC snapshot state is created, but NO WAL records are written (no TxnBegin, no TxnAbort).

~5-10x faster per-operation than begin_read_only() because it avoids:

  • 2 WAL mutex lock/unlock cycles per transaction
  • 2 WAL BufWriter serializations per transaction

Callers MUST use abort_read_only_fast() to clean up — NOT commit() or abort().

Source

pub fn abort_read_only_fast(&self, txn: TxnHandle)

Abort a fast read-only transaction — O(1), no WAL, no memtable scan.

Source

pub fn get_raw_read(&self, key: &[u8]) -> Option<Vec<u8>>

Read a key WITHOUT any MVCC transaction tracking.

Uses the current global timestamp to see all committed writes. Bypasses: begin/abort, active_txns DashMap, record_read, stats. Only safe for single-threaded access with no concurrent writers.

Source

pub fn scan_raw(&self, prefix: &[u8]) -> Vec<(Vec<u8>, Vec<u8>)>

Scan by prefix WITHOUT any MVCC transaction tracking.

Uses the current global timestamp. Only safe for single-threaded access.

Source

pub fn begin_write_only(&self) -> Result<TxnHandle>

Begin a write-only transaction (optimized: no read tracking)

Write-only transactions skip read tracking, improving insert throughput for bulk loading scenarios.

Use this for:

  • Bulk data imports
  • Append-only logging tables
  • ETL pipelines
Source

pub fn commit(&self, txn: TxnHandle) -> Result<u64>

Commit a transaction

In concurrent mode, acquires the shared writer lock to ensure WAL writes are serialized across processes, and forces a flush+sync so that subsequent processes see the committed data.

Source

pub fn abort(&self, txn: TxnHandle) -> Result<()>

Abort a transaction

Source

pub fn set_table_index_policy(&self, table: &str, policy: IndexPolicy)

Configure index policy for a table

This allows fine-grained control over write/scan trade-offs per table:

PolicyInsert CostScan CostUse Case
WriteOptimizedO(1)O(N)High-write, rare scan
BalancedO(1) amortO(output+logK)Mixed OLTP
ScanOptimizedO(log N)O(logN + K)Analytics, range query
AppendOnlyO(1)O(N)Time-series logs
§Example
// Fast inserts for logs table (no ordered index overhead)
db.set_table_index_policy("logs", IndexPolicy::WriteOptimized);

// Efficient range scans for analytics table
db.set_table_index_policy("analytics", IndexPolicy::ScanOptimized);

// Balanced for OLTP tables
db.set_table_index_policy("users", IndexPolicy::Balanced);
Source

pub fn get_table_index_policy(&self, table: &str) -> IndexPolicy

Get the index policy for a table

Source

pub fn index_registry(&self) -> &Arc<TableIndexRegistry>

Get the index registry for advanced configuration

Source

pub fn put(&self, txn: TxnHandle, key: &[u8], value: &[u8]) -> Result<()>

Put a key-value pair

In concurrent mode, acquires the shared writer lock to ensure WAL writes are serialized across processes.

Source

pub fn put_batch(&self, txn: TxnHandle, writes: &[(&[u8], &[u8])]) -> Result<()>

Batch put multiple key-value pairs with reduced overhead

This amortizes per-operation costs over the entire batch:

  • Single DashMap lookup
  • Batch MVCC tracking
  • Batch memtable writes

For 100+ entries, this is 2-3x faster than individual puts.

§Example
let writes: Vec<(&[u8], &[u8])> = vec![
    (b"key1", b"value1"),
    (b"key2", b"value2"),
    (b"key3", b"value3"),
];
db.put_batch(txn, &writes)?;
Source

pub fn get(&self, txn: TxnHandle, key: &[u8]) -> Result<Option<Vec<u8>>>

Get a value by key

Source

pub fn delete(&self, txn: TxnHandle, key: &[u8]) -> Result<()>

Delete a key

Source

pub fn scan( &self, txn: TxnHandle, prefix: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys with a prefix (enforces minimum prefix length for safety).

§Prefix Safety

To prevent accidental full-table scans, this method requires a minimum prefix length of 2 bytes. Use scan_unchecked for internal operations that need empty/short prefixes.

§Errors

Returns SochDBError::InvalidInput if prefix is too short.

Source

pub fn scan_unchecked( &self, txn: TxnHandle, prefix: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys with a prefix without length validation.

§Warning

This method allows empty/short prefixes which can cause expensive full-table scans. Use scan() unless you specifically need unrestricted prefix access for internal operations.

Source

pub fn scan_range( &self, txn: TxnHandle, start: &[u8], end: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys in range

Source

pub fn scan_range_iter<'a>( &'a self, txn: TxnHandle, start: &'a [u8], end: &'a [u8], ) -> impl Iterator<Item = Result<(Vec<u8>, Vec<u8>)>> + 'a

Streaming scan for very large result sets

Returns an iterator that yields (key, value) pairs without materializing the entire result set. Use this for large scans where memory efficiency is important.

§Performance
  • Memory: O(1) per iteration vs O(N) for scan_range
  • Latency: First result available immediately vs waiting for all results
  • Throughput: Slightly lower due to per-item overhead
§Usage
for result in db.scan_range_iter(txn, b"start", b"end") {
    let (key, value) = result?;
    // Process immediately - no need to wait for all results
}
Source

pub fn flush(&self) -> Result<()>

Flush memtable to WAL/Disk

Source

pub fn storage_stats(&self) -> StorageStats

Get storage statistics

Source

pub fn put_path(&self, txn: TxnHandle, path: &str, value: &[u8]) -> Result<()>

Put a value at a path

Path format: “collection/doc_id/field” or “table.row_id.column” Resolution is O(|path|), not O(log N) like B-tree.

Source

pub fn get_path(&self, txn: TxnHandle, path: &str) -> Result<Option<Vec<u8>>>

Get a value at a path

Source

pub fn delete_path(&self, txn: TxnHandle, path: &str) -> Result<()>

Delete at a path

Source

pub fn scan_path( &self, txn: TxnHandle, prefix: &str, ) -> Result<Vec<(String, Vec<u8>)>>

Scan a path prefix

Returns all key-value pairs where key starts with prefix. Useful for: “users/123/” -> all fields of user 123

Source

pub fn query(&self, txn: TxnHandle, path_prefix: &str) -> QueryBuilder<'_>

Execute a path query and return results

This is the main query interface for LLM context retrieval. Supports:

  • Path prefix matching
  • Column projection (for I/O reduction)
  • Limit/offset
Source

pub fn register_table(&self, schema: TableSchema) -> Result<()>

Register a table schema

Source

pub fn get_table_schema(&self, name: &str) -> Option<TableSchema>

Get table schema

Source

pub fn update_table_schema( &self, old_name: &str, schema: TableSchema, ) -> Result<()>

Update the schema for an existing table (used by ALTER TABLE).

Replaces the schema in both the tables DashMap and the packed schema cache atomically (per-key). The caller is responsible for validating the new schema.

Source

pub fn list_tables(&self) -> Vec<String>

List all tables

Source

pub fn enable_cdc(&mut self, config: CdcConfig) -> Arc<CdcLog>

Enable CDC on this database, returning the CDC log handle.

Subsequent mutations emitted via the SQL execution layer will be recorded in the CDC log for subscriber consumption.

Source

pub fn cdc_log(&self) -> Option<&Arc<CdcLog>>

Get the CDC log handle, if CDC is enabled.

Source

pub fn insert_row( &self, txn: TxnHandle, table: &str, row_id: u64, values: &HashMap<String, SochValue>, ) -> Result<()>

Insert a row into a table

Uses packed row format: stores entire row as single key-value pair. This reduces write amplification from 4× to 1× for a 4-column table.

§Performance
  • Before: 4 columns × (WAL entry + MVCC version) = 4 writes
  • After: 1 packed row = 1 write
  • Improvement: ~4× fewer WAL entries, ~48% less I/O overhead
Source

pub fn read_row( &self, txn: TxnHandle, table: &str, row_id: u64, columns: Option<&[&str]>, ) -> Result<Option<HashMap<String, SochValue>>>

Read a row from a table

Reads packed row and extracts requested columns in O(k) time. Column projection happens in memory, not storage - all columns are fetched.

Source

pub fn insert_rows_batch( &self, txn: TxnHandle, table: &str, rows: &[(u64, HashMap<String, SochValue>)], ) -> Result<usize>

Insert multiple rows efficiently in a batch

This method accumulates all rows and writes them with fewer WAL syncs. Ideal for bulk loading scenarios.

§Performance
  • Uses group commit to batch fsync operations
  • Expected throughput: 500K-1M rows/sec depending on row size
Source

pub fn put_raw(&self, txn: TxnHandle, key: &[u8], value: &[u8]) -> Result<()>

Ultra-fast raw put - bypasses all validation

Use when you’ve already validated the data and just need speed. This is ~10× faster than insert_row() for bulk inserts.

Source

pub fn insert_row_slice( &self, txn: TxnHandle, table: &str, row_id: u64, values: &[Option<&SochValue>], ) -> Result<()>

Zero-allocation insert - fastest path for bulk inserts

Takes values as a slice in schema column order, avoiding HashMap overhead.

§Arguments
  • txn - Transaction handle
  • table - Table name
  • row_id - Row identifier
  • values - Values in schema column order (None = NULL)
§Performance
  • Eliminates ~6 allocations per row vs insert_row()
  • Expected: 1.2M-1.5M inserts/sec
§Example
let values: &[Option<&SochValue>] = &[
    Some(&SochValue::Int(1)),
    Some(&SochValue::Text("Alice".into())),
    None, // NULL
];
db.insert_row_slice(txn, "users", 1, values)?;
Source

pub fn fsync(&self) -> Result<()>

Force fsync to disk

Source

pub fn checkpoint(&self) -> Result<u64>

Create a checkpoint

Source

pub fn truncate_wal(&self) -> Result<()>

Truncate the WAL file after a checkpoint.

See DurableStorage::truncate_wal for safety notes.

Source

pub fn gc(&self) -> usize

Run garbage collection

Source

pub fn stats(&self) -> Stats

Get database statistics

Source

pub fn shutdown(&self) -> Result<()>

Shutdown the database gracefully

Trait Implementations§

Source§

impl Drop for Database

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more