featherdb-core 1.0.0

Core types, errors, and configuration for FeatherDB
Documentation

FeatherDB Core

Core types, traits, and utilities shared across all FeatherDB crates.

Overview

This crate provides the foundational types that all other FeatherDB components depend on. It has minimal dependencies to avoid circular dependencies in the workspace.

Components

1. Value Types (value.rs)

The fundamental data type for all database values.

pub enum Value {
    Null,
    Boolean(bool),
    Integer(i64),
    Real(f64),
    Text(String),
    Blob(Vec<u8>),
    Timestamp(i64),  // Unix milliseconds
}

Capabilities:

  • Type conversions (as_bool(), as_i64(), as_f64(), as_str(), as_bytes())
  • Serialization/deserialization to bytes
  • Ordering and comparison (implements Ord)
  • Hashing (implements Hash)
  • Display formatting
  • Memory size estimation for GC tracking

Column Types:

pub enum ColumnType {
    Boolean,
    Integer,
    Real,
    Text { max_len: Option<usize> },
    Blob { max_len: Option<usize> },
    Timestamp,
}

Type Compatibility and Coercion:

// Check if a value is compatible with a column type
let text_type = ColumnType::Text { max_len: Some(100) };
assert!(text_type.is_compatible(&Value::Text("hello".into())));
assert!(text_type.is_compatible(&Value::Null));  // NULL compatible with any type

// Coerce values between types
let int_type = ColumnType::Integer;
let coerced = int_type.coerce(Value::Real(3.7))?;  // Value::Integer(3)
let coerced = int_type.coerce(Value::Boolean(true))?;  // Value::Integer(1)

let text_type = ColumnType::Text { max_len: None };
let coerced = text_type.coerce(Value::Integer(42))?;  // Value::Text("42")

Supported Coercions:

From To Result
Integer Boolean 0 = false, non-zero = true
Real Integer Truncates decimal
Boolean Integer false = 0, true = 1
Integer Real Exact conversion
Integer/Real Text String representation
Integer Timestamp Unix milliseconds

2. Error Types (error.rs)

Comprehensive error handling with context and intelligent suggestions.

pub enum Error {
    // User-facing errors (actionable)
    TableNotFound { table: String, suggestion: Option<String> },
    ColumnNotFound { column: String, table: String, suggestion: Option<String> },
    AmbiguousColumn { column: String, tables: String },
    IndexNotFound { index: String, table: String },
    SyntaxError { message: String, line: usize, column: usize },
    TypeError { expected: String, actual: String },
    UniqueViolation { table: String, column: String },
    PrimaryKeyViolation { table: String },
    NotNullViolation { table: String, column: String },
    ForeignKeyViolation { details: String },
    TransactionConflict,
    TransactionEnded,
    SavepointNotFound { name: String },
    TableAlreadyExists { table: String },
    IndexAlreadyExists { index: String },
    ReadOnly,
    DatabaseLocked,
    InvalidQuery { message: String },
    Query(QueryError),
    Unsupported { feature: String },

    // Internal errors
    CorruptedPage(PageId),
    CorruptedWal { message: String },
    InvalidPageType { page_id: PageId, page_type: u8 },
    PageNotFound(PageId),
    DoubleFree(PageId),
    BufferPoolFull,
    InvalidDatabaseFile { message: String },
    VersionMismatch { file_version: u32, expected: u32 },
    Internal(String),

    // I/O errors
    Io(std::io::Error),
    FileNotFound { path: String },
    Serialization(String),

    // Compression errors
    CompressionError { message: String },
    DecompressionError { message: String, expected_size: usize },
}

Error Helpers:

// Check if error is retriable (TransactionConflict, DatabaseLocked)
if error.is_retriable() {
    // Retry the operation
}

// Check if user error vs system error
if error.is_user_error() {
    // Show to user
} else {
    // Log and handle internally
}

// Access rich query error context
if let Some(query_err) = error.as_query_error() {
    println!("{}", query_err.format_with_context());
}

Rich Query Errors with Context:

pub struct QueryError {
    pub message: String,
    pub sql: String,
    pub line: usize,
    pub column: usize,
    pub suggestion: Option<String>,
    pub help: Option<String>,
    pub span_length: usize,
}

Query errors produce formatted output similar to Rust compiler errors:

Error: Column 'naem' not found
  --> query:1:8
  |
  1 | SELECT naem FROM users
  |        ^^^^
  |
  = suggestion: Did you mean 'name'?

Levenshtein Distance for Suggestions:

use featherdb_core::{levenshtein_distance, find_best_match, suggest_keyword};

// Calculate edit distance between strings
let distance = levenshtein_distance("naem", "name");  // 2

// Find best match from candidates
let columns = vec!["name", "email", "age", "id"];
let suggestion = find_best_match("naem", &columns, 2);  // Some("name")

// Suggest corrections for SQL keyword typos
let keyword = suggest_keyword("SELEKT");  // Some("SELECT")
let keyword = suggest_keyword("FORM");    // Some("FROM")
let keyword = suggest_keyword("WHRE");    // Some("WHERE")

The suggest_keyword function handles 100+ common SQL keyword typos including:

  • SELECT, FROM, WHERE, INSERT, UPDATE, DELETE
  • CREATE, TABLE, INDEX, DROP, ALTER
  • JOIN, LEFT, RIGHT, INNER, OUTER
  • ORDER, GROUP, HAVING, LIMIT, OFFSET
  • PRIMARY, FOREIGN, UNIQUE, CONSTRAINT
  • BEGIN, COMMIT, ROLLBACK, TRANSACTION

3. Configuration (lib.rs)

Database configuration with builder pattern.

pub struct Config {
    pub page_size: usize,           // Default: 4096
    pub buffer_pool_pages: usize,   // Default: 16384 (64MB with 4KB pages)
    pub path: PathBuf,
    pub create_if_missing: bool,    // Default: true
    pub enable_wal: bool,           // Default: true
    pub sync_on_commit: bool,       // Default: true (deprecated, use wal_config)
    pub encryption: EncryptionConfig,
    pub wal_config: WalGroupCommitConfig,
    pub compression: CompressionConfig,
    pub eviction_policy: EvictionPolicyType,
}

Encryption Configuration:

pub enum EncryptionConfig {
    None,
    Password(String),      // Derives key using Argon2id
    Key([u8; 32]),         // Direct 256-bit key
}

WAL Sync Modes:

pub enum WalSyncMode {
    Immediate,    // Sync on every commit (safest, slowest) - default
    GroupCommit,  // Batch commits into single fsync (2-5x faster)
    NoSync,       // No sync - data loss on crash (fastest)
}

pub struct WalGroupCommitConfig {
    pub sync_mode: WalSyncMode,
    pub group_commit_interval_ms: u64,   // Default: 10ms
    pub group_commit_max_batch: usize,   // Default: 1000 records
}

Compression Configuration:

pub enum CompressionType {
    None,
    Lz4,                    // Fast compression
    Zstd { level: i32 },    // Level 1-22 (higher = better compression)
}

pub struct CompressionConfig {
    pub compression_type: CompressionType,
    pub threshold: usize,   // Minimum size to compress (default: 512 bytes)
}

Eviction Policy Types:

pub enum EvictionPolicyType {
    Clock,  // Second-chance algorithm - simple and fast (default)
    Lru2,   // Tracks last 2 access times - 5-15% higher hit rate
    Lirs,   // Adapts to workload patterns
}

Builder Pattern:

let config = Config::new("mydb.db")
    .create_if_missing(true)
    .page_size(8192)
    .buffer_pool_size_mb(256)
    .with_password("secret")
    // WAL configuration
    .with_group_commit()
    .group_commit_interval_ms(20)
    .group_commit_max_batch(2000)
    // Compression
    .with_zstd_compression_level(9)
    .compression_threshold(256)
    // Eviction policy
    .with_lru2_eviction();

// Or use specific compression presets
let config = Config::new("mydb.db")
    .with_lz4_compression();      // Fast compression
    .with_zstd_compression();     // Default level (3)

// Check configuration
assert!(config.is_encrypted());
assert!(config.is_compressed());

4. Core Identifiers

Type-safe identifiers for database objects:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct PageId(pub u64);

impl PageId {
    pub const INVALID: PageId = PageId(u64::MAX);
    pub fn is_valid(&self) -> bool;
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct TransactionId(pub u64);

impl TransactionId {
    pub const NONE: TransactionId = TransactionId(0);
    pub fn next(&self) -> TransactionId;
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord, Default)]
pub struct Lsn(pub u64);  // Log Sequence Number

impl Lsn {
    pub const ZERO: Lsn = Lsn(0);
    pub fn next(&self) -> Lsn;
}

All identifiers implement From<u64> and Into<u64> for easy conversion.

5. Row Traits (row.rs)

Traits for converting between Rust types and database rows.

/// Convert a single Rust value to/from database Value
pub trait ToValue {
    fn to_value(&self) -> Value;
    fn column_type() -> ColumnType;
}

pub trait FromValue: Sized {
    fn from_value(value: &Value) -> Result<Self>;
}

/// Convert a Rust struct to/from database rows
pub trait ToRow {
    fn to_values(&self) -> Vec<Value>;
    fn column_names() -> &'static [&'static str];
}

pub trait FromRow: Sized {
    fn from_values(values: &[Value]) -> Result<Self>;
}

/// Table schema information
pub trait TableSchema {
    fn table_name() -> &'static str;
    fn columns() -> Vec<ColumnDef>;
    fn primary_key() -> &'static [&'static str];
}

/// Column definition for schema generation
pub struct ColumnDef {
    pub name: &'static str,
    pub column_type: ColumnType,
    pub primary_key: bool,
    pub not_null: bool,
    pub unique: bool,
    pub default: Option<Value>,
}

Built-in ToValue/FromValue Implementations:

  • bool
  • i32, i64
  • f64
  • String, &str
  • Vec<u8>
  • Option<T> where T: ToValue/FromValue

Usage with derive macro:

use featherdb::Table;

#[derive(Table)]
struct User {
    #[primary_key]
    id: i64,
    name: String,
    email: Option<String>,
}

// Automatically implements ToRow, FromRow, TableSchema
let user = User { id: 1, name: "Alice".into(), email: None };
let values = user.to_values();
let restored = User::from_values(&values)?;

6. Serialization Format

Values are serialized with a type tag followed by the data:

Type Tag Format
Null 0 (no data)
Boolean 1 1 byte (0 or 1)
Integer 2 8 bytes little-endian i64
Real 3 8 bytes little-endian f64
Text 4 4 bytes length (LE u32) + UTF-8 bytes
Blob 5 4 bytes length (LE u32) + raw bytes
Timestamp 6 8 bytes little-endian i64

Serialized Sizes:

  • Null: 1 byte
  • Boolean: 2 bytes
  • Integer/Real/Timestamp: 9 bytes
  • Text/Blob: 5 + content length

7. Constants

System-wide constants:

pub mod constants {
    pub const MAGIC: &[u8; 10] = b"FEATHERDB\0";
    pub const FORMAT_VERSION: u32 = 1;
    pub const DEFAULT_PAGE_SIZE: usize = 4096;
    pub const SUPERBLOCK_SIZE: usize = 512;
    pub const PAGE_HEADER_SIZE: usize = 64;
    pub const SLOT_SIZE: usize = 6;  // offset:u16, len:u16, flags:u8, reserved:u8
    pub const SCHEMA_ROOT_PAGE: u64 = 1;
    pub const FREELIST_ROOT_PAGE: u64 = 2;
    pub const FIRST_DATA_PAGE: u64 = 3;
}

Usage Examples

Working with Values

use featherdb_core::Value;

// Create values
let int_val = Value::Integer(42);
let text_val = Value::Text("hello".into());
let null_val = Value::Null;

// Type checking
assert!(!int_val.is_null());
assert_eq!(int_val.as_i64(), Some(42));
assert_eq!(text_val.as_str(), Some("hello"));

// Comparison (NULL is less than everything)
assert!(Value::Integer(1) < Value::Integer(2));
assert!(Value::Null < Value::Integer(0));

// Cross-type numeric comparison
assert!(Value::Integer(3) < Value::Real(3.5));

// Serialization
let mut buf = Vec::new();
int_val.serialize(&mut buf);
let restored = Value::deserialize(&mut &buf[..])?;
assert_eq!(int_val, restored);

// Memory estimation for GC
let size = text_val.size_estimate();  // 24 + string length

Error Handling with Suggestions

use featherdb_core::{Error, Result, find_best_match, QueryError};

fn lookup_column(name: &str, columns: &[&str], table: &str) -> Result<usize> {
    for (i, col) in columns.iter().enumerate() {
        if *col == name {
            return Ok(i);
        }
    }

    // Find suggestion using Levenshtein distance
    let suggestion = find_best_match(name, columns, 2).map(String::from);

    Err(Error::ColumnNotFound {
        column: name.into(),
        table: table.into(),
        suggestion,
    })
}

// Rich query errors with formatting
let err = QueryError::new("Column 'naem' not found", "SELECT naem FROM users", 1, 8)
    .with_span(4)
    .with_suggestion("Did you mean 'name'?")
    .with_help("Available columns: id, name, email");

println!("{}", err.format_with_context());

Configuration

use featherdb_core::{Config, WalGroupCommitConfig, WalSyncMode, EvictionPolicyType};

// Simple configuration
let config = Config::new("app.db");

// Full configuration with all options
let config = Config::new("app.db")
    .create_if_missing(true)
    .page_size(8192)
    .buffer_pool_size_mb(256)
    .with_password("secret")
    .with_group_commit()
    .group_commit_interval_ms(20)
    .with_zstd_compression_level(9)
    .with_lru2_eviction();

// Custom WAL configuration
let wal_config = WalGroupCommitConfig::new()
    .sync_mode(WalSyncMode::GroupCommit)
    .group_commit_interval_ms(15)
    .group_commit_max_batch(500);

let config = Config::new("app.db")
    .wal_config(wal_config);

Testing

# Run all core tests (using Make)
make test-crate CRATE=featherdb-core

# Or with cargo directly
cargo test -p featherdb-core

# Run with output
cargo test -p featherdb-core -- --nocapture

# Run coverage (from project root)
make coverage  # or: cargo llvm-cov --workspace

Design Principles

  1. Minimal Dependencies: Only essential crates (thiserror, serde, bytes)
  2. No Circular Dependencies: Core is at the bottom of the dependency graph
  3. Type Safety: Strong typing for identifiers prevents mixing
  4. Serialization: All types can round-trip to bytes
  5. Error Context: Errors include enough context for debugging and suggestions
  6. Builder Pattern: Configuration uses fluent builder API

Public Exports

// From lib.rs
pub use error::{Error, QueryContext, QueryError, Result, find_best_match, levenshtein_distance, suggest_keyword};
pub use row::{ColumnDef, FromRow, FromValue, TableSchema, ToRow, ToValue};
pub use value::{ColumnType, Value};

pub struct PageId(pub u64);
pub struct TransactionId(pub u64);
pub struct Lsn(pub u64);

pub enum EncryptionConfig { None, Password(String), Key([u8; 32]) }
pub enum CompressionType { None, Lz4, Zstd { level: i32 } }
pub struct CompressionConfig { ... }
pub enum WalSyncMode { Immediate, GroupCommit, NoSync }
pub struct WalGroupCommitConfig { ... }
pub enum EvictionPolicyType { Clock, Lru2, Lirs }
pub struct Config { ... }

pub mod constants { ... }

Future Improvements

  • JSON value type
  • Decimal/numeric type for financial data
  • Array/nested types