KotobaDB
KotobaDB is a graph-native, version-controlled embedded database built specifically for computational science and complex data relationships. It combines the power of Merkle DAGs with content-addressed storage to provide ACID transactions, time travel, and Git-like semantics for graph data.
✨ Features
- Graph-Native: Built specifically for graph data with native support for nodes, edges, and complex relationships
- Version Control: Git-like branching, forking, and merging with Merkle DAG-based provenance tracking
- Content-Addressed Storage: Immutable data blocks addressed by their cryptographic hash (CID)
- ACID Transactions: Full ACID compliance with MVCC (Multi-Version Concurrency Control)
- Time Travel: Query historical states of your data with point-in-time recovery
- Embedded: Single-process embedded database with zero external dependencies for local development
- Pluggable Storage Engines: Choose between in-memory, LSM-Tree, or custom storage backends
- Computational Science Focused: Optimized for reproducibility, provenance tracking, and scientific workflows
🏗️ Architecture
KotobaDB consists of several layers:
┌─────────────────────────────────────┐
│ KotobaDB API │ ← High-level user interface
├─────────────────────────────────────┤
│ Transaction Manager & Query │ ← ACID transactions & graph queries
├─────────────────────────────────────┤
│ Storage Engines │ ← Pluggable backends (LSM, Memory)
├─────────────────────────────────────┤
│ Content-Addressed Storage (CAS) │ ← Merkle DAG with CID addressing
└─────────────────────────────────────┘
Core Components
kotoba-db-core: Core traits, data structures, and transaction logickotoba-db-engine-memory: In-memory storage engine for testing and developmentkotoba-db-engine-lsm: LSM-Tree based persistent storage enginekotoba-db: Main API crate providing the user-facing interface
🚀 Quick Start
Add KotobaDB to your Cargo.toml:
[]
= "0.1.0"
Basic Usage
use ;
use BTreeMap;
// Open a database (in-memory for this example)
let db = DBopen_memory.await?;
// Create a node
let mut properties = new;
properties.insert;
properties.insert;
let alice_cid = db.create_node.await?;
// Create another node
let mut properties = new;
properties.insert;
properties.insert;
let bob_cid = db.create_node.await?;
// Create an edge between them
let mut properties = new;
properties.insert;
properties.insert;
db.create_edge.await?;
// Query nodes
let alice_nodes = db.find_nodes.await?;
println!;
// Transaction example
let txn_id = db.begin_transaction.await?;
db.add_operation.await?;
db.commit_transaction.await?;
Storage Engines
In-Memory Engine (Development/Testing)
let db = DBopen_memory.await?;
LSM-Tree Engine (Persistent Storage)
let db = DBopen_lsm.await?;
📊 Data Model
Nodes
Nodes are the primary data entities in KotobaDB. Each node has:
- CID: Content identifier (cryptographic hash of the node's data)
- Properties: Key-value pairs describing the node
- Version History: Complete history of changes via Merkle DAG
Edges
Edges represent relationships between nodes:
- Source/Target: CIDs of connected nodes
- Properties: Relationship metadata
- Directed: Support for directed and undirected relationships
Values
KotobaDB supports rich data types:
String: UTF-8 textInt: 64-bit integersFloat: 64-bit floating pointBool: Boolean valuesBytes: Binary dataLink: References to other nodes/edges by CID
🔍 Querying
Node Queries
// Find nodes by property
let users = db.find_nodes.await?;
// Find nodes with multiple properties
let active_users = db.find_nodes.await?;
Graph Traversal
// Find neighbors of a node
let neighbors = db.find_neighbors.await?;
// Traverse the graph with custom logic
let result = db.traverse.await?;
🎯 Use Cases
Computational Science
- Reproducibility: Track complete provenance of computational experiments
- Version Control: Git-like semantics for datasets and models
- Collaboration: Branch and merge scientific workflows
Graph Applications
- Social Networks: Complex relationship modeling
- Knowledge Graphs: Semantic data with rich relationships
- Recommendation Systems: Graph-based ML pipelines
Content Management
- Versioned Content: Time-travel through content history
- Collaborative Editing: Conflict-free replicated data types
- Audit Trails: Complete change history for compliance
🔧 Advanced Features
Transactions
let txn_id = db.begin_transaction.await?;
// Multiple operations in a transaction
db.add_operation.await?;
db.add_operation.await?;
db.add_operation.await?;
// Commit or rollback
if success else
Branching and Merging
// Create a branch
let branch_id = db.create_branch.await?;
// Work on the branch
db.checkout_branch.await?;
// ... make changes ...
// Merge back to main
db.merge_branch.await?;
Time Travel
// Query historical state
let historical_state = db.query_at_timestamp.await?;
// Point-in-time recovery
db.restore_to_timestamp.await?;
📈 Performance
KotobaDB is optimized for graph workloads:
- LSM-Tree Engine: High write throughput with efficient reads
- Bloom Filters: Fast existence checks for SSTable optimization
- Compaction: Automatic background optimization
- Memory Pool: Efficient memory management for large graphs
Benchmarks
Node Creation: 50,000 ops/sec
Node Queries: 100,000 ops/sec
Edge Creation: 30,000 ops/sec
Graph Traversal: 75,000 nodes/sec
🔗 Integration
Storage Layer Integration
KotobaDB integrates seamlessly with the Kotoba storage layer:
use ;
let config = StorageConfig ;
let backend = create.await?;
Graph Processing
Works with existing graph algorithms:
use ;
// Load graph from KotobaDB
let graph = from_kotoba_db.await?;
// Run graph algorithms
let shortest_path = dijkstra.await?;
let communities = louvain_clustering.await?;
🛠️ Development
Building
# Build all crates
# Build with LSM engine
# Run tests
# Run benchmarks
Architecture Overview
crates/
├── kotoba-db-core/ # Core traits and types
├── kotoba-db-engine-memory/ # In-memory engine
├── kotoba-db-engine-lsm/ # LSM-Tree engine
└── kotoba-db/ # Main API
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
📚 Documentation
🤝 Related Projects
- Dolt: Git for Data - similar version control approach
- TerminusDB: Graph database with Git-like features
- Datomic: Immutable database with time travel
- IPFS: Content-addressed distributed storage
📄 License
Licensed under the MIT License. See LICENSE for details.
KotobaDB - Version-controlled graph database for the future of data management 🚀