Skip to main content

Crate obj

Crate obj 

Source
Expand description

obj — embedded document database (public crate).

This crate is the user-facing surface of the obj storage engine. It wraps the obj-core building blocks (pager, WAL, B+tree, codec, catalog, transaction layer) into the typed Db / Collection<T> API described in design.md.

Worked examples for every topic live next to the relevant item in this crate’s rustdoc:

§Quick start

use obj::Db;
use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize, obj::Document)]
struct Order { customer_id: u64, total_cents: u64 }

fn run() -> obj::Result<()> {
    let db = Db::open("app.obj")?;
    let id = db.insert(Order { customer_id: 1, total_cents: 100 })?;
    let back: Option<Order> = db.get(id)?;
    assert!(back.is_some());
    Ok(())
}

§Core CRUD and the Document derive

Open a database with one of three constructors:

Each Db is Send + Sync. Share across threads via Arc<Db> for the concurrent-reader / single-writer workload documented in docs/concurrency.md.

Implement the Document trait on every type you want to persist. The obj::Document re-export is a proc-macro that fills in the trait’s associated constants from optional #[obj(...)] attributes:

  • #[obj(collection = "...")] — sets Document::COLLECTION. Default: the type name.
  • #[obj(version = N)] — sets Document::VERSION. Default: 1.
  • #[obj(index)], #[obj(index = unique)], #[obj(index = each)] on a field — declare secondary indexes (see § “Queries and indexes” below).
  • #[obj(index_composite(fields = ("a", "b")))] at struct level — declare a composite index.

The one-shot API runs each call inside a private transaction and is the typical entry point for ad-hoc work:

§Transactions and iteration

For multi-document atomicity, Db::transaction runs a closure with a &mut WriteTxn. The closure returns Result<R>; commit on Ok, rollback on Err, rollback-via-Drop on panic. Inside the closure, WriteTxn::collection yields a typed Collection<T> handle whose methods compose with the parent txn — every write rides one WAL transaction.

For read-only consistency across multiple reads, Db::read_transaction runs a closure with a &ReadTxn. The closure observes one consistent snapshot of the database; concurrent writers do not affect what it sees.

For full-collection iteration there are two shapes:

  • Db::iter_all — streaming iterator over Result<(Id, T)>. Peak memory is bounded at a small constant (256 entries per refill, power-of-ten Rule 3) regardless of collection size.
  • Db::all — one-line shim that drives iter_all to exhaustion and collects into Vec<T>. Pays memory proportional to the collection.

See docs/concurrency.md for the lock-acquisition contract and Db::transaction / Db::read_transaction for worked examples of the closure shape.

§Queries and indexes

Db::query::<T>() constructs a Query builder. Compose with Query::filter, Query::limit, Query::sort_by, Query::index_range; terminate with Query::fetch (materialised Vec<T>) or Query::count (count alone, without decoding documents on the fast path).

The query layer has two sources: a full primary-tree scan (default) or an index-range slice (Query::index_range). No cost-based planner — the caller picks. Source order is by primary Id for the full scan and by encoded index-key bytes for the index range.

Query::sort_by materialises every surviving candidate into a sort buffer before applying Query::limit. The buffer is capped at MAX_SORT_BUFFER (100 000 documents); overflowing the cap surfaces Error::SortBufferExceeded. Override the cap with Query::sort_buffer_limit when the workload genuinely needs more.

Indexes are declared on the document type via Document::indexes (or the derive’s #[obj(index ...)] attributes). The catalog reconciler runs on the first WriteTxn::collection::<T>() call per process per collection: it declares missing specs, marks stale active descriptors DroppedPending, and is idempotent. Reconciliation rides the caller’s WAL transaction — a rolled- back insert leaves no half-created index behind.

Four IndexKinds are exposed: Standard, Unique, Each, Composite. Construct typed IndexSpecs via IndexSpec::standard / ::unique / ::each / ::composite when hand-implementing Document::indexes.

§Schema evolution

Bump Document::VERSION on every breaking change. Register a DynamicSchema for each prior version in Document::historical_schemas, and provide a Document::migrate body that lifts the structured obj_core::codec::Dynamic view into the current Self.

Migration is lazy: a stored record whose type_version is older than Self::VERSION is migrated on read but the on-disk bytes are NOT rewritten until the next Collection::update / Collection::upsert for that id. The collection therefore scales to billions of documents without a stop-the-world rebuild on schema bumps.

Worked recipes for the four common patterns — single-version migration, multi-version chains, tombstoned fields, enum-variant migration — live on Document::migrate and in the integration tests: historical_schemas.rs, tombstone_migration.rs, enum_migration.rs, and lazy_migration.rs. The lazy-rewrite cycle itself is documented on Collection::get.

§Attach, backup, integrity

Db::attach registers a read-only second .obj file under a caller-chosen namespace. Any Document whose COLLECTION is of the form <namespace>.<name> dispatches reads against the attached file; writes against a namespaced collection return Error::AttachedDatabaseIsReadOnly. Each attached database gets its own snapshot pinned at read-transaction begin; Db::detach removes the registry entry but in-flight reads complete against their pinned snapshot.

Db::backup_to writes a self-contained .obj file at the LSN of an internally-taken reader snapshot. Writers continue against the source; post-snapshot writes are NOT in the destination. The algorithm is documented in docs/format.md § “Hot backup”. Two failure modes: Error::BackupDestinationExists (refuses to overwrite) and Error::BackupNotSupportedForMemoryPager (in-memory dbs have no file backend to copy from).

Db::integrity_check runs a full bidirectional walk: every active collection’s primary + index B-trees, freelist sweep, orphan-page detection, primary↔index cross-reference. Returns IntegrityReport with a failures list and a pages_checked count. The lightweight subset that Db::open runs at open time is obj_core::integrity::quick_check; opt out of the open-time walk via Config::skip_open_check.

§Configuration

Config is a Clone builder. Defaults match the “production-safe” posture documented in design.md:

  • Config::cache_size — bytes for the pager’s LRU. Default 256 KiB (64 frames). Larger for read-heavy workloads on large databases; smaller on memory-constrained targets.
  • Config::sync_mode — durability mode for every WAL commit. Default SyncMode::Full (system-wide power loss survivable). SyncMode::Normal for fsync-only durability; SyncMode::Off only for tests and benchmarks.
  • Config::busy_timeout — max wait when acquiring the reader / writer lock. Default 5 seconds. Beyond the budget, the txn returns Err(Error::Busy) rather than blocking indefinitely.
  • Config::skip_open_check — opt out of the open-time catalog walk. Default false (run the walk). Production callers should leave it on.
  • Config::cross_process_lock — toggle OS-level byte-range locking. Default true (on). Off only when every accessor shares one Db inside one process (in-process stress tests).

§Cargo features

  • serde (off by default) — derive serde::Serialize and serde::Deserialize on the public types in this crate (Config, DbStat, CollectionStat, DumpRecord, IntegrityReport, IntegrityFailure, plus the obj-core re-exports Id, SyncMode, LockKind, IndexKind, IndexSpec). When the feature is on, Serialize and Deserialize are also re-exported from the crate root, so downstream callers do not need a separate serde dependency. Pure additive surface — no on-disk format byte changes.
  • tracing (off by default) — emit structured spans around the observability surface: db.open, db.transaction, db.read_transaction, db.integrity_check, query.execute, and the obj-core pager.checkpoint span (propagated via the obj-core/tracing sub-feature). The feature gates the optional tracing dependency on both crates so the default build has zero new transitive deps and zero span overhead. tracing is intentionally NOT re-exported from this crate — downstream subscribers add tracing-subscriber (or another subscriber crate) directly, mirroring the idiom used by tokio and axum.
  • compression (off by default) — LZ4 per-page compression at the pager layer (Phase 3, issue #8). Propagates to obj-core. Every v1.0 writer stamps format_minor = 2 regardless of which codecs are enabled; whether a file uses compression is recorded by feature_flags bit 0, not by the minor. A build WITHOUT this feature opens any file whose bit 0 is clear, and refuses (with Error::FormatFeatureUnsupported) only a file that actually has the compression flag set.
  • encryption (off by default) — XChaCha20-Poly1305 per-page at-rest encryption (Phase 4, issue #9). Propagates to obj-core. As with compression, the file’s minor is always 2; feature_flags bit 1 records whether the file is encrypted. A build WITHOUT this feature opens any file whose bit 1 is clear, and refuses (with Error::FormatFeatureUnsupported) a file whose bit 1 is set — the refusal keys off the feature flag, not the minor version.
  • async (off by default) — runtime-agnostic async surface mirroring the blocking Db / Collection / Query API behind a new obj::asynchronous module (Phase 5, issue #10). Work is routed through the blocking crate’s process-wide thread pool, so the wrapper composes with Tokio, async-std, smol, and any other async runtime — no per-runtime sub-features. With the feature off the baseline build adds no new transitive dependencies and no async overhead.

§Observability

Enable the tracing feature to emit spans around database operations; spans are gated and free when the feature is off. The span set is small and stable: one info-level span at every transaction boundary, one debug-level span at every query execution and pager checkpoint. No span field captures user payload bytes — the only string-ish field is path on db.open, which is a filesystem path rather than user content.

§unsafe policy

This crate is #![forbid(unsafe_code)]. All unsafe lives in obj-core::platform and carries a documented safety contract per docs/unsafe-audit.md.

Modules§

asynchronousasync
Runtime-agnostic async surface — Phase 5 (issue #10).

Structs§

Collection
Typed handle to a collection.
CollectionStat
Per-collection summary inside DbStat.
Config
Db open-time configuration. Construct via Config::default and modify with the builder methods.
Db
The embedded document database.
DbStat
One-shot snapshot of a database’s header + catalog summary.
DumpIter
Streaming iterator returned by Db::dump_raw.
DumpRecord
One raw record yielded by DumpIter.
EnumVariantSchema
One variant of a DynamicSchema::Enum description.
Id
Per-collection document identifier.
IndexSpec
A runtime index declaration.
IntegrityReport
Structured result of an integrity check.
IterAll
Streaming iterator returned by Db::iter_all.
IterIndexRange
Streaming iterator returned by Collection::iter_range. Yields Result<(user_key_bytes, T)> one row at a time; internally refills a fixed-size (user_key, Id) buffer in batches of ITER_INDEX_RANGE_BATCH = 256 so the per-step pager-lock cost amortises. Memory stays bounded at O(batch × small_bytes + distinct_ids) regardless of the range’s total size.
Query
The M8 query builder.
ReadTxn
Public read transaction. Acquired by crate::Db::read_transaction.
WriteTxn
Public write transaction.

Enums§

CompressionMode
Phase 3 (issue #8): per-pager compression knob. Selects whether newly-created files use the transparent LZ4 page-compression layer (format_minor = 1, feature_flags bit 0 set) or stay at the original uncompressed format_minor = 0 layout.
DynamicSchema
Describes the byte-stream shape of a postcard-encoded payload at one version. See module docs for the variant ↔ wire-format mapping.
Error
The pager-level error type.
IndexKind
What kind of secondary index a given IndexSpec declares.
IntegrityFailure
Categorical reasons an integrity walk records a failure. Every variant carries the locus of the problem (page id, collection, index name, document id) so an operator can root-cause without re-running the check.
LockKind
Lock category for Error::Busy. Three variants because the three categories of contention produce different operator guidance: a contended cross-process WRITER_LOCK means another process is writing; a contended WriterInProcess means another thread of the same process is writing; a contended reader lock is unusual (31 slots, shared) and indicates either a saturated 31+-process workload or a stale lock left by a frozen process.
SyncMode
Durability mode for FileHandle::sync_data.

Constants§

MAX_DISTINCT_IDS
Per-call cap on the bounded HashSet<Id> used by Collection::count_distinct_ids_in_range to count unique document Ids under an Each index. Power-of-ten Rule 3: the distinct set is allocation-bounded; exceeding the cap surfaces obj_core::Error::DistinctCountExceeded rather than chewing arbitrary memory. The user can narrow the range via .index_range(...) to fit inside the budget.
MAX_SORT_BUFFER
Default cap on the in-memory sort buffer. The query layer reads at most this many surviving documents into RAM before sorting; a scan that produces more candidates surfaces Error::SortBufferExceeded. M8 #66.

Traits§

Deserializeserde
Re-export of serde::Serialize + serde::Deserialize under the opt-in serde feature (issue #6). Lets downstream code write use obj::{Serialize, Deserialize} without a separate serde dependency — the same convention tokio and axum use. A data structure that can be deserialized from any data format supported by Serde.
Document
The trait every user document type implements.
Schema
A type whose postcard wire shape is describable by a DynamicSchema.
Serializeserde
Re-export of serde::Serialize + serde::Deserialize under the opt-in serde feature (issue #6). Lets downstream code write use obj::{Serialize, Deserialize} without a separate serde dependency — the same convention tokio and axum use. A data structure that can be serialized into any data format supported by Serde.

Type Aliases§

Result
Crate-local Result alias. Use this in new code unless an explicit std::result::Result is required for trait-impl reasons.

Derive Macros§

Deserializeserde
Re-export of serde::Serialize + serde::Deserialize under the opt-in serde feature (issue #6). Lets downstream code write use obj::{Serialize, Deserialize} without a separate serde dependency — the same convention tokio and axum use.
Document
#[derive(obj::Document)] proc-macro re-export.
Serializeserde
Re-export of serde::Serialize + serde::Deserialize under the opt-in serde feature (issue #6). Lets downstream code write use obj::{Serialize, Deserialize} without a separate serde dependency — the same convention tokio and axum use.