Skip to main content

Crate solo_storage

Crate solo_storage 

Source
Expand description

Solo storage: SQLite + SQLCipher persistence layer.

§Concurrency invariants (per ADR-0003)

  • Writes go through WriteHandle; reads go through ReaderPool. Direct connection access is an anti-pattern outside the actor + pool.
  • The writer connection opens once and is owned by the writer thread for the daemon’s lifetime.
  • The read pool’s post_create hook binds the raw SQLCipher key on each new connection.
  • pending_index ordering is always SQL COMMIT → HNSW.add → drain row. Never reverse.
  • Arc<dyn VectorIndex + Send + Sync> is shared between the writer and the read pool; concurrency is provided by the impl (e.g., hnsw_rs’s internal parking_lot::RwLock), not by application-level locks.

§Module layout

Commit 1.1 — solo init building blocks:

  • path_validation — refuse cloud-sync data dirs.
  • key_material — Argon2id passphrase → 32-byte SQLCipher key.
  • configsolo.config.toml (salt + embedder identity).
  • migration — runner + the v0 schema (migrations/0001_initial.sql).
  • lockfile — RAII solo.lock to serialize concurrent runs.
  • init — orchestrator: solo_storage::init(params).

Commit 1.2 — single-writer actor + read pool:

  • writerWriterActor, WriteHandle, WriteCommand.
  • readerReaderPool (deadpool-sqlite + post_create raw-key).

Commit 1.3 — HNSW backing for solo_core::VectorIndex + snapshot I/O:

  • vector_indexHnswIndex (hnsw_rs wrapper), HnswFactory.
  • snapshot — atomic two-file save (live/_bak/_tmp basenames) + load/load_bak per ADR-0003 §“Startup file-existence decision tree”.
  • recoveryreplay_pending_index, detect_drift. Used by the daemon-main startup chain (commit 1.5).

Embedder impls:

  • embedder::stubStubEmbedder, deterministic hash-based F32 embedder for tests + offline development.
  • embedder::ollamaOllamaEmbedder, real semantic embeddings via a local Ollama daemon (/api/embeddings). The recommended production backend since v0.5.1; default for new deployments.

(v0.5.x also shipped a BGE-M3 / candle-transformers backend; it was deprecated in v0.5.0 and removed in v0.6.0. The replacement is OllamaEmbedder.)

Commit 1.5+ (daemon main + signal handlers) lands in subsequent files; the surfaces here are stable for that wiring.

Re-exports§

pub use audit::AuditEvent;
pub use audit::AuditOperation;
pub use audit::AuditResult;
pub use audit::AuditWriter;
pub use audit::AuditWriterShutdown;
pub use audit::insert_audit_admin_row;
pub use audit::insert_audit_row_in_tx;
pub use audit::purge_older_than;
pub use backup::DEFAULT_BACKUP_PAGES_PER_STEP;
pub use backup::backup_database;
pub use backup::backup_from_connection;
pub use backup::paths_refer_to_same_file;
pub use config::AuditSettings;
pub use config::AuthSettings;
pub use config::CustomRedactionPattern;
pub use config::DocumentConfig;
pub use config::EmbedderConfig;
pub use config::IdentityConfig;
pub use config::LlmSettings;
pub use config::RedactionConfig;
pub use config::SamplingConfig;
pub use config::SamplingConfigDiagnostic;
pub use config::SoloConfig;
pub use config::StewardSettings;
pub use config::TriplesConfig;
pub use gdpr::ForgetReport;
pub use gdpr::estimate_forget_scope;
pub use gdpr::forget_principal;
pub use redaction::RedactionMatch;
pub use redaction::RedactionRegistry;
pub use redaction::RedactionResult;
pub use steward_factory::McpSamplingStewardFactory;
pub use steward_factory::StaticStewardFactory;
pub use steward_factory::StewardFactory;
pub use tenant_backup::BackupReport;
pub use tenant_backup::RestoreReport;
pub use tenant_backup::backup_tenant;
pub use tenant_backup::restore_tenant;
pub use document::ChunkConfig;
pub use document::ChunkSpec;
pub use document::ParseError;
pub use document::ParsedDocument;
pub use document::chunk_text;
pub use document::parse_file;
pub use embedder::OllamaEmbedder;
pub use embedder::StubEmbedder;
pub use embedder::build_embedder_from_env;
pub use embedder::probe_embedder_config_from_env;
pub use embedder::BUNDLED_EMBEDDER_DIM;
pub use embedder::BUNDLED_EMBEDDER_NAME;
pub use embedder::BUNDLED_EMBEDDER_VERSION;
pub use embedder::BundledEmbedder;
pub use embedder_registry::EmbedderIdentity;
pub use embedder_registry::get_or_insert_embedder_id;
pub use hnsw_id::HNSW_CHUNK_BIT;
pub use hnsw_id::HnswIdKind;
pub use hnsw_id::chunk_hnsw_id;
pub use hnsw_id::decode_hnsw_id;
pub use hnsw_id::episode_hnsw_id;
pub use init::InitOutcome;
pub use init::InitParams;
pub use init::default_data_dir;
pub use init::default_embedder;
pub use init::init;
pub use init::open_sqlcipher;
pub use key_material::KeyMaterial;
pub use lockfile::Lockfile;
pub use merge_candidates::MergeCandidateStats;
pub use merge_candidates::count_existing_merge_candidates;
pub use migration::current_tenants_index_version;
pub use migration::current_version;
pub use migration::run_migrations;
pub use migration::run_tenants_index_migrations;
pub use path_validation::validate_data_dir;
pub use reader::DEFAULT_POOL_SIZE;
pub use reader::ReaderPool;
pub use recovery::DriftReport;
pub use recovery::RebuildReport;
pub use recovery::ReplayReport;
pub use recovery::detect_drift;
pub use recovery::rebuild_hnsw_from_sql;
pub use recovery::replay_pending_index;
pub use snapshot::BAK_BASENAME;
pub use snapshot::LIVE_BASENAME;
pub use snapshot::TMP_BASENAME;
pub use startup::StartupOutcome;
pub use startup::StartupParams;
pub use startup::run as startup_run;
pub use tenants::TENANTS_INDEX_FILENAME;
pub use tenants::TENANTS_SUBDIR;
pub use tenants::TenantCostNumbers;
pub use tenants::TenantHandle;
pub use tenants::TenantOpenParams;
pub use tenants::TenantRecord;
pub use tenants::TenantRegistry;
pub use tenants::TenantRegistryParams;
pub use tenants::TenantStatus;
pub use tenants::TenantsIndex;
pub use tenants::migrate_v071_to_v080;
pub use triples_batch::TriplesBatchSignal;
pub use vector_index::HnswFactory;
pub use vector_index::HnswIndex;
pub use vector_index::HnswParams;
pub use writer::AttachAbstractionBatchReport;
pub use writer::DEFAULT_CHANNEL_CAPACITY;
pub use writer::DEFAULT_INGEST_MAX_BYTES;
pub use writer::ConsolidationReport;
pub use writer::ConsolidationScope;
pub use writer::ForgetDocumentReport;
pub use writer::IngestReport;
pub use writer::MAX_REMEMBER_BATCH_SIZE;
pub use writer::NormalizeReport;
pub use writer::ReembedReport;
pub use writer::ReembedScope;
pub use writer::ResolveContradictionReport;
pub use writer::WriteCommand;
pub use writer::WriteHandle;
pub use writer::WriterActor;
pub use writer::WriterSpawn;
pub use writer::resolve_ingest_max_bytes;

Modules§

audit
Per-tenant audit log infrastructure (v0.8.0 P4).
backup
Online SQLCipher backup.
config
solo.config.toml reader/writer.
document
Document parsing + chunking for v0.7.0 RAG/document memory.
embedder
Embedder implementations behind the solo_core::Embedder trait.
embedder_registry
embedders table registry. Every embedder model that produces vectors in this database has a row keyed by (name, version) with its dim + dtype + first-seen timestamp.
gdpr
GDPR right-to-erasure (v0.8.0 P6) — hard-delete every row tied to a principal subject in one tenant.
hnsw_id
Kind-discriminated rowid encoding for the shared HNSW namespace.
hnsw_rebuild
Shared HNSW-tombstone-rebuild helpers.
init
solo init: create a fresh Solo data directory.
key_material
KeyMaterial: holds the raw 32-byte SQLCipher key derived once at startup from the user passphrase via Argon2id.
llm
Production LlmClient backends.
lockfile
solo.lock: O_EXCL-style mutex that prevents two daemons (or two solo init invocations) from racing on the same data dir.
merge_candidates
Read-side helper for solo doctor: count existing-cluster pairs that the existing-vs-existing merge pass would coalesce on the next consolidate --force-merge (or --force-merge-on-timer daemon cycle).
migration
SQL schema migrations. Runs once at startup against the SQLCipher database after PRAGMA key has been bound.
path_validation
Refuse to initialize Solo inside a cloud-sync folder.
reader
ReaderPool: pool of read-only SQLite connections backed by deadpool-sqlite. Each newly-created connection has its raw SQLCipher key bound via a post_create hook (PBKDF2 cost paid once per connection, not per query). See ADR-0003 §“Trait shapes” and §P8-A/P8-B.
recovery
Startup recovery for the HNSW index. Two pieces:
redaction
Opt-in PII redaction registry (v0.8.0 P5).
snapshot
HNSW snapshot save/load. ADR-0003 §P8-C: hnsw_rs writes a pair of files (*.hnsw.data + *.hnsw.graph); we drive an atomic two-step save with fsync and a previous-version backup.
startup
Daemon startup orchestration. Per ADR-0003 §O6 (“Startup ordering: linear await chain in main()”) and §“Startup file-existence decision tree”.
steward_factory
StewardFactory trait — abstracts how a per-tenant Arc<Steward> is built at registry-open time.
tenant_backup
Per-tenant SQLCipher backup + restore (v0.8.0 P6).
tenants
Tenant registry for v0.8.0 multi-tenancy.
triples_batch
v0.9.0 P4c: daemon-side background batch driver for triple extraction.
vector_index
HnswIndexsolo_core::VectorIndex implementation backed by hnsw_rs.
writer
WriterActor, WriteCommand, WriteHandle — single-writer actor on a dedicated OS thread. See ADR-0003 §“Trait shapes” and §“Operational invariants”.