# SynaDB
<p align="center">
<img src="assets/full-logo.png" alt="SynaDB Logo" width="300"/>
</p>
[](https://github.com/gtava5813/SynaDB/actions/workflows/ci.yml)
[](https://app.codacy.com/gh/gtava5813/SynaDB/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[](https://pypi.org/project/synadb/)
[](https://crates.io/crates/synadb)
[](https://github.com/gtava5813/SynaDB/blob/main/LICENSE)
> An AI-native embedded database.
An embedded, log-structured, columnar-mapped database engine written in Rust. Syna combines the embedded simplicity of SQLite, the columnar analytical speed of DuckDB, and the schema flexibility of MongoDB.
## Features
- **Append-only log structure** - Fast sequential writes, immutable history
- **Schema-free** - Store heterogeneous data types without migrations
- **AI/ML optimized** - Extract time-series data as contiguous tensors for PyTorch/TensorFlow
- **Vector Store** - Native embedding storage with HNSW index for similarity search
- **MmapVectorStore** - Ultra-high-throughput vector storage (7x faster than VectorStore)
- **HNSW Index** - O(log N) approximate nearest neighbor search
- **Gravity Well Index** - Novel O(N) build time index (faster build than HNSW)
- **Cascade Index** - Three-stage hybrid index (LSH + bucket tree + graph) (Experimental)
- **Sparse Vector Store** - Inverted index for lexical embeddings (SPLADE, BM25, TF-IDF)
- **Tensor Engine** - Batch tensor operations with chunked storage
- **Model Registry** - Version models with SHA-256 checksum verification
- **Experiment Tracking** - Log parameters, metrics, and artifacts
- **LLM Integrations** - LangChain, LlamaIndex, Haystack support
- **ML Integrations** - PyTorch Dataset/DataLoader, TensorFlow tf.data
- **CLI Tool** - Command-line database inspection and management
- **Studio Web UI** - Visual database explorer with 3D embedding clusters
- **GPU Direct** - CUDA tensor loading (optional feature)
- **FAISS Integration** - Billion-scale vector search (optional feature)
- **C-ABI interface** - Use from Python, Node.js, C++, or any FFI-capable language
- **Delta & LZ4 compression** - Minimize storage for time-series data
- **Crash recovery** - Automatic index rebuild on open
- **Thread-safe** - Concurrent read/write access with mutex-protected writes
## Installation
### Rust
```toml
[dependencies]
synadb = "1.1.1"
```
### Python
```bash
pip install synadb
```
See [Python Package](https://pypi.org/project/synadb/) for full Python documentation.
### Building from Source
```bash
# Clone the repository
git clone https://github.com/gtava5813/SynaDB.git
cd SynaDB
# Build release version
cargo build --release
# Run tests
cargo test
```
The compiled library will be at:
- Linux: `target/release/libsynadb.so`
- macOS: `target/release/libsynadb.dylib`
- Windows: `target/release/synadb.dll`
## Quick Start
### Rust Usage
```rust
use synadb::{synadb, Atom, Result};
fn main() -> Result<()> {
// Open or create a database
let mut db = synadb::new("my_data.db")?;
// Write different data types
db.append("temperature", Atom::Float(23.5))?;
db.append("count", Atom::Int(42))?;
db.append("name", Atom::Text("sensor-1".to_string()))?;
db.append("raw_data", Atom::Bytes(vec![0x01, 0x02, 0x03]))?;
// Read values back
if let Some(temp) = db.get("temperature")? {
println!("Temperature: {:?}", temp);
}
// Append more values to build history
db.append("temperature", Atom::Float(24.1))?;
db.append("temperature", Atom::Float(24.8))?;
// Extract history as tensor for ML
let history = db.get_history_floats("temperature")?;
println!("Temperature history: {:?}", history); // [23.5, 24.1, 24.8]
// Delete a key
db.delete("count")?;
assert!(db.get("count")?.is_none());
// List all keys
let keys = db.keys();
println!("Keys: {:?}", keys);
// Compact to reclaim space
db.compact()?;
// Close (optional - happens on drop)
db.close()?;
Ok(())
}
```
### Python Usage (ctypes)
```python
import ctypes
from ctypes import c_char_p, c_double, c_int64, c_int32, c_size_t, POINTER, byref
# Load the library
lib = ctypes.CDLL("./target/release/libsynadb.so") # or .dylib/.dll
# Define function signatures
lib.syna_open.argtypes = [c_char_p]
lib.syna_open.restype = c_int32
lib.syna_close.argtypes = [c_char_p]
lib.syna_close.restype = c_int32
lib.syna_put_float.argtypes = [c_char_p, c_char_p, c_double]
lib.syna_put_float.restype = c_int64
lib.syna_get_float.argtypes = [c_char_p, c_char_p, POINTER(c_double)]
lib.syna_get_float.restype = c_int32
lib.syna_get_history_tensor.argtypes = [c_char_p, c_char_p, POINTER(c_size_t)]
lib.syna_get_history_tensor.restype = POINTER(c_double)
lib.syna_free_tensor.argtypes = [POINTER(c_double), c_size_t]
lib.syna_free_tensor.restype = None
lib.syna_delete.argtypes = [c_char_p, c_char_p]
lib.syna_delete.restype = c_int32
# Usage
db_path = b"my_data.db"
# Open database
result = lib.syna_open(db_path)
assert result == 1, f"Failed to open database: {result}"
# Write float values
lib.syna_put_float(db_path, b"temperature", 23.5)
lib.syna_put_float(db_path, b"temperature", 24.1)
lib.syna_put_float(db_path, b"temperature", 24.8)
# Read latest value
value = c_double()
result = lib.syna_get_float(db_path, b"temperature", byref(value))
if result == 1:
print(f"Temperature: {value.value}")
# Get history as numpy-compatible array
length = c_size_t()
ptr = lib.syna_get_history_tensor(db_path, b"temperature", byref(length))
if ptr:
# Convert to Python list (or use numpy.ctypeslib for zero-copy)
history = [ptr[i] for i in range(length.value)]
print(f"History: {history}")
# Free the tensor memory
lib.syna_free_tensor(ptr, length)
# Close database
lib.syna_close(db_path)
```
### C/C++ Usage
```c
#include "synadb.h"
#include <stdio.h>
int main() {
const char* db_path = "my_data.db";
// Open database
int result = syna_open(db_path);
if (result != 1) {
fprintf(stderr, "Failed to open database: %d\n", result);
return 1;
}
// Write values
syna_put_float(db_path, "temperature", 23.5);
syna_put_float(db_path, "temperature", 24.1);
syna_put_int(db_path, "count", 42);
syna_put_text(db_path, "name", "sensor-1");
// Read float value
double temp;
if (syna_get_float(db_path, "temperature", &temp) == 1) {
printf("Temperature: %f\n", temp);
}
// Get history tensor for ML
size_t len;
double* tensor = syna_get_history_tensor(db_path, "temperature", &len);
if (tensor) {
printf("History (%zu values):", len);
for (size_t i = 0; i < len; i++) {
printf(" %f", tensor[i]);
}
printf("\n");
// Free tensor memory
syna_free_tensor(tensor, len);
}
// Delete a key
syna_delete(db_path, "count");
// Compact database
syna_compact(db_path);
// Close database
syna_close(db_path);
return 0;
}
```
Compile with:
```bash
gcc -o myapp myapp.c -L./target/release -lsynadb -Wl,-rpath,./target/release
```
## Vector Store
Store and search embeddings for RAG applications:
```python
from synadb import VectorStore
import numpy as np
# Create store with 768 dimensions (BERT-sized)
store = VectorStore("vectors.db", dimensions=768)
# Insert embeddings
embedding1 = np.random.randn(768).astype(np.float32)
embedding2 = np.random.randn(768).astype(np.float32)
store.insert("doc1", embedding1)
store.insert("doc2", embedding2)
# Search for similar vectors
query_embedding = np.random.randn(768).astype(np.float32)
results = store.search(query_embedding, k=5)
for r in results:
print(f"{r.key}: {r.score:.4f}")
```
### Distance Metrics
The VectorStore supports three distance metrics:
| `cosine` (default) | Cosine distance (1 - cosine_similarity) | Text embeddings, normalized vectors |
| `euclidean` | Euclidean (L2) distance | Image embeddings, spatial data |
| `dot_product` | Negative dot product | Maximum inner product search |
```python
# Use euclidean distance
store = VectorStore("vectors.db", dimensions=768, metric="euclidean")
# Use dot product
store = VectorStore("vectors.db", dimensions=768, metric="dot_product")
```
### Supported Dimensions
Vector dimensions from 64 to 8192 are supported, covering all common embedding models:
| MiniLM | 384 |
| BERT base | 768 |
| BERT large | 1024 |
| OpenAI ada-002 | 1536 |
| OpenAI text-embedding-3-large | 3072 |
| NVIDIA NV-Embed-v2 | 4096 |
| OpenAI text-embedding-3-large (max) | 8192 |
### HNSW Index
For large-scale vector search (>10,000 vectors), SynaDB uses HNSW (Hierarchical Navigable Small World) indexing for approximate nearest neighbor search with O(log N) complexity.
```python
from synadb import VectorStore
import numpy as np
# HNSW is automatically enabled when vector count exceeds threshold
store = VectorStore("vectors.db", dimensions=768)
# Insert many vectors - HNSW index builds automatically
for i in range(100000):
embedding = np.random.randn(768).astype(np.float32)
store.insert(f"doc{i}", embedding)
# Search is now O(log N) instead of O(N)
results = store.search(query_embedding, k=10) # <10ms for 1M vectors
```
HNSW Configuration (Rust API):
```rust
use synadb::hnsw::{HnswIndex, HnswConfig};
use synadb::distance::DistanceMetric;
// Custom HNSW configuration
let config = HnswConfig::with_m(32) // More connections = better recall
.ef_construction(200) // Higher = better index quality
.ef_search(100); // Higher = better search recall
let mut index = HnswIndex::new(768, DistanceMetric::Cosine, config);
```
| `m` | 16 | Max connections per node (8-64 typical) |
| `m_max` | 32 | Max connections at higher layers (2×M) |
| `ef_construction` | 200 | Build quality (100-500 typical) |
| `ef_search` | 100 | Search quality (50-500 typical) |
### MmapVectorStore
For ultra-high-throughput vector ingestion (7x faster than VectorStore), use MmapVectorStore:
```python
from synadb import MmapVectorStore
import numpy as np
# Create store with pre-allocated capacity
store = MmapVectorStore("vectors.mmap", dimensions=768, initial_capacity=100000)
# Batch insert - 7x faster than VectorStore
keys = [f"doc_{i}" for i in range(10000)]
vectors = np.random.randn(10000, 768).astype(np.float32)
store.insert_batch(keys, vectors) # 7x faster than VectorStore
# Build HNSW index
store.build_index()
# Search
results = store.search(query_embedding, k=10)
# Checkpoint to persist (not per-write like VectorStore)
store.checkpoint()
store.close()
```
| Write speed | ~67K/sec | ~490K/sec |
| Durability | Per-write | Checkpoint |
| Capacity | Dynamic | Pre-allocated |
### Gravity Well Index (GWI)
For scenarios where index build time is critical, GWI provides O(N) build time (faster than HNSW):
```python
from synadb import GravityWellIndex
import numpy as np
# Create index
gwi = GravityWellIndex("vectors.gwi", dimensions=768)
# Initialize with sample vectors (required)
sample = np.random.randn(1000, 768).astype(np.float32)
gwi.initialize(sample)
# Insert vectors - O(N) total build time
keys = [f"doc_{i}" for i in range(50000)]
vectors = np.random.randn(50000, 768).astype(np.float32)
gwi.insert_batch(keys, vectors)
# Search with tunable recall (nprobe=50 gives 98% recall)
results = gwi.search(query_embedding, k=10, nprobe=50)
```
**GWI vs HNSW Build Time:**
| 10K × 768 | 2.1s | 18.4s | 8.9x |
| 50K × 768 | 3.0s | 504s | 168x |
**When to use which:**
- **VectorStore**: General use, good all-around
- **MmapVectorStore**: High-throughput ingestion, large datasets
- **GWI**: Build time critical, streaming/real-time data
- **Cascade**: Balanced build/search, tunable recall
- **FAISS**: Billion-scale, GPU acceleration
### Cascade Index (Experimental)
For balanced performance with tunable recall/latency trade-off:
```python
from synadb import CascadeIndex
import numpy as np
# Create with preset configuration
index = CascadeIndex("vectors.cascade", dimensions=768, preset="large")
# Or custom configuration
index = CascadeIndex("vectors.cascade", dimensions=768,
num_hyperplanes=16, bucket_capacity=128, nprobe=8)
# Insert vectors - no initialization required
keys = [f"doc_{i}" for i in range(50000)]
vectors = np.random.randn(50000, 768).astype(np.float32)
index.insert_batch(keys, vectors)
# Search
results = index.search(query_embedding, k=10)
# Save and close
index.save()
index.close()
```
**Configuration Presets:**
| `small` | <100K vectors | Fast | Fast | 95%+ |
| `large` | 1M+ vectors | Medium | Fast | 95%+ |
| `high_recall` | Accuracy critical | Slow | Medium | 99%+ |
| `fast_search` | Latency critical | Fast | Very Fast | 90%+ |
**Architecture:**
1. **LSH Layer** - Hyperplane-based locality-sensitive hashing with multi-probe
2. **Bucket Tree** - Adaptive splitting when buckets exceed threshold
3. **Sparse Graph** - Local neighbor connections for search refinement
### Sparse Vector Store (SVS)
For lexical embeddings from sparse encoders like SPLADE, BM25, or TF-IDF:
```python
from synadb import SparseVectorStore
# Create store with vocabulary size
store = SparseVectorStore("lexical.svs", vocab_size=30522)
# Index sparse vectors (from any encoder)
# Example: SPLADE output for "machine learning"
store.index("doc1", indices=[101, 2054, 3000, 4521], values=[0.8, 0.5, 0.3, 0.2])
store.index("doc2", indices=[101, 5678, 9012], values=[0.9, 0.4, 0.1])
# Search with sparse query
query_indices = [101, 2054]
query_values = [0.7, 0.6]
results = store.search(query_indices, query_values, k=10)
for r in results:
print(f"{r.key}: {r.score:.4f}")
# Get statistics
stats = store.stats()
print(f"Documents: {stats.num_vectors}, Vocab: {stats.vocab_size}")
# Persistence
store.save()
store.close()
# Reopen existing store
store = SparseVectorStore.open("lexical.svs")
```
**Rust API:**
```rust
use synadb::sparse_vector_store::{SparseVectorStore, SparseVector};
// Create store
let mut store = SparseVectorStore::new("lexical.svs", 30522)?;
// Index sparse vectors
let vec = SparseVector::new(vec![101, 2054, 3000], vec![0.8, 0.5, 0.3])?;
store.index("doc1", vec)?;
// Search
let query = SparseVector::new(vec![101, 2054], vec![0.7, 0.6])?;
let results = store.search(&query, 10)?;
// Persistence
store.save()?;
```
**When to use SVS:**
- Lexical/keyword search (BM25, TF-IDF)
- Learned sparse representations (SPLADE, SPLADE++)
- Hybrid search (combine with dense vectors)
- High-dimensional sparse data
**Architecture:**
- Inverted index maps vocabulary terms to document postings
- O(min(nnz)) search complexity (nnz = non-zero elements in query)
- Exact search (100% recall)
- Efficient for high-dimensional sparse vectors
### Hybrid Vector Store (Hot/Cold Architecture)
For production workloads that need both real-time ingestion AND fast search, use HybridVectorStore which combines GWI (hot layer) with Cascade (cold layer):
```rust
use synadb::arch::{HybridVectorStore, HybridConfig};
// Create hybrid store
let config = HybridConfig {
hot: GwiConfig::default(),
cold: CascadeConfig::preset_large(768),
};
let mut store = HybridVectorStore::new("hot.gwi", "cold.cascade", config)?;
// Initialize hot layer with sample vectors
store.initialize_hot(&sample_vectors)?;
// Real-time ingestion to hot layer (O(1) per insert)
store.ingest("doc1", &embedding)?;
// Search both layers (results merged automatically)
let results = store.search(&query, 10)?;
// Periodically promote hot → cold for better search performance
let promoted = store.promote_to_cold()?;
```
**Architecture:**
| Hot | GWI | Real-time buffer | O(1) sync | Fallback |
| Cold | Cascade | Historical storage | Batch | Primary |
**When to use:**
- Streaming data with real-time search requirements
- High-throughput ingestion with periodic batch optimization
- Production systems needing both speed and quality
## Tensor Engine
The TensorEngine provides efficient batch operations for ML data loading.
**Key Semantics:** When storing tensors, the first parameter is a **key prefix**, not a full key. Elements are stored with auto-generated keys like `{prefix}0000`, `{prefix}0001`, etc. When loading, use glob patterns like `{prefix}*` to retrieve all elements.
```python
from synadb import TensorEngine
import numpy as np
# Create tensor engine
engine = TensorEngine("training_data.db")
# Store training data (prefix "train/" generates keys: train/0000, train/0001, ...)
X_train = np.random.randn(10000, 784).astype(np.float32)
engine.put_tensor("train/", X_train) # Note: prefix ends with /
# Load as tensor (pattern matching with glob)
X = engine.get_tensor("train/*", dtype=np.float32)
# Load with specific shape
X = engine.get_tensor("train/*", shape=(10000, 784), dtype=np.float32)
# For large tensors, use chunked storage (more efficient)
engine.put_tensor_chunked("model/weights/", large_tensor, chunk_size=10000)
X = engine.get_tensor_chunked("model/weights/chunk_*", dtype=np.float32)
# Stream in batches for training
for batch in engine.stream("train/*", batch_size=32):
model.train_step(batch)
```
### PyTorch Integration
```python
# Load directly as PyTorch tensor
X = engine.get_tensor_torch("train/*", device="cuda")
# Or use with DataLoader
from torch.utils.data import TensorDataset, DataLoader
X = engine.get_tensor_torch("train/*")
y = engine.get_tensor_torch("labels/*")
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, shuffle=True)
```
### TensorFlow Integration
```python
# Load directly as TensorFlow tensor
X = engine.get_tensor_tf("train/*")
# Use with tf.data
import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices(X).batch(32)
```
### Rust API
```rust
use synadb::{SynaDB, Atom};
use synadb::tensor::{TensorEngine, DType};
// Create database and populate with data
let mut db = SynaDB::new("data.db")?;
for i in 0..100 {
db.append(&format!("sensor/{:04}", i), Atom::Float(i as f64 * 0.1))?;
}
// Create tensor engine
let mut engine = TensorEngine::new(db);
// Load all sensor data as a tensor
let (data, shape) = engine.get_tensor("sensor/*", DType::Float64)?;
assert_eq!(shape[0], 100);
// Store tensor with auto-generated keys
let values: Vec<u8> = vec![1.0f64, 2.0, 3.0, 4.0]
.iter()
.flat_map(|f| f.to_le_bytes())
.collect();
let count = engine.put_tensor("values/", &values, &[4], DType::Float64)?;
```
### Supported Data Types
| `Float32` | 4 bytes | 32-bit floating point |
| `Float64` | 8 bytes | 64-bit floating point |
| `Int32` | 4 bytes | 32-bit signed integer |
| `Int64` | 8 bytes | 64-bit signed integer |
## Model Registry
Store and version ML models with automatic checksum verification:
### Python Usage
```python
from synadb import ModelRegistry
# Create a model registry
registry = ModelRegistry("models.db")
# Save a model with metadata
model_data = open("model.pt", "rb").read()
metadata = {"accuracy": "0.95", "framework": "pytorch"}
version = registry.save_model("classifier", model_data, metadata)
print(f"Saved version {version.version} with checksum {version.checksum}")
# Load the latest version (with automatic checksum verification)
data, info = registry.load_model("classifier")
print(f"Loaded {info.size_bytes} bytes, stage: {info.stage}")
# Load a specific version
data, info = registry.load_model("classifier", version=1)
# List all versions
versions = registry.list_versions("classifier")
for v in versions:
print(f"v{v.version}: {v.stage} ({v.size_bytes} bytes)")
# Promote to production
registry.set_stage("classifier", version.version, "Production")
# Get the production model
prod = registry.get_production("classifier")
if prod:
print(f"Production version: {prod.version}")
```
### Rust Usage
```rust
use synadb::model_registry::{ModelRegistry, ModelStage};
use std::collections::HashMap;
// Create a model registry
let mut registry = ModelRegistry::new("models.db")?;
// Save a model with metadata
let model_data = vec![0u8; 1024]; // Your model bytes
let mut metadata = HashMap::new();
metadata.insert("accuracy".to_string(), "0.95".to_string());
metadata.insert("framework".to_string(), "pytorch".to_string());
let version = registry.save_model("classifier", &model_data, metadata)?;
println!("Saved version {} with checksum {}", version.version, version.checksum);
// Load the latest version (with automatic checksum verification)
let (data, info) = registry.load_model("classifier", None)?;
println!("Loaded {} bytes", data.len());
// Load a specific version
let (data, info) = registry.load_model("classifier", Some(1))?;
// List all versions
let versions = registry.list_versions("classifier")?;
for v in versions {
println!("v{}: {} ({} bytes)", v.version, v.stage, v.size_bytes);
}
// Promote to production
registry.set_stage("classifier", version.version, ModelStage::Production)?;
// Get the production model
if let Some(prod) = registry.get_production("classifier")? {
println!("Production version: {}", prod.version);
}
```
### Model Stages
Models progress through deployment stages:
| `Development` | Initial stage for new models (default) |
| `Staging` | Models being tested before production |
| `Production` | Models actively serving predictions |
| `Archived` | Retired models kept for reference |
### Checksum Verification
Every model is stored with a SHA-256 checksum. When loading, the checksum is automatically verified to detect corruption:
```python
# If the model data is corrupted, load_model raises an error
try:
data, info = registry.load_model("classifier")
except SynaError as e:
print(f"Checksum mismatch: {e}")
```
## Experiment Tracking
Track ML experiments with parameters, metrics, and artifacts:
### Python Usage
```python
from synadb import Experiment
# Create an experiment
exp = Experiment("mnist_classifier", "experiments.db")
# Start a run with tags
with exp.start_run(tags=["baseline", "v1"]) as run:
# Log hyperparameters
run.log_params({
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 100,
"optimizer": "adam"
})
# Log metrics during training
for epoch in range(100):
loss = 1.0 / (epoch + 1)
accuracy = 0.5 + 0.005 * epoch
run.log_metrics({"loss": loss, "accuracy": accuracy}, step=epoch)
# Log artifacts
run.log_artifact("model.pt", model.state_dict())
run.log_artifact("config.json", json.dumps(config).encode())
# Query runs
completed_runs = exp.query(filter={"status": "completed"})
best_runs = exp.query(sort_by="accuracy", ascending=False)
# Get metrics as numpy array for plotting
loss_history = exp.get_metric_tensor(run.id, "loss")
import matplotlib.pyplot as plt
plt.plot(loss_history)
plt.title("Training Loss")
plt.show()
# Compare runs
comparison = exp.compare_runs([run1.id, run2.id])
print(comparison)
```
### Rust Usage
```rust
use synadb::experiment::{ExperimentTracker, RunStatus};
// Create an experiment tracker
let mut tracker = ExperimentTracker::new("experiments.db")?;
// Start a run with tags
let run_id = tracker.start_run("mnist_classifier", vec!["baseline".to_string()])?;
// Log hyperparameters
tracker.log_param(&run_id, "learning_rate", "0.001")?;
tracker.log_param(&run_id, "batch_size", "32")?;
tracker.log_param(&run_id, "epochs", "100")?;
// Log metrics during training
for epoch in 0..100 {
let loss = 1.0 / (epoch + 1) as f64;
let accuracy = 0.5 + 0.005 * epoch as f64;
tracker.log_metric(&run_id, "loss", loss, Some(epoch as u64))?;
tracker.log_metric(&run_id, "accuracy", accuracy, Some(epoch as u64))?;
}
// Log artifacts
let model_data = vec![0u8; 1024]; // Your model bytes
tracker.log_artifact(&run_id, "model.pt", &model_data)?;
// End the run
tracker.end_run(&run_id, RunStatus::Completed)?;
// Query runs
let runs = tracker.list_runs("mnist_classifier")?;
for run in runs {
println!("Run {}: status={}", run.id, run.status);
}
// Get metrics
let loss_values = tracker.get_metric(&run_id, "loss")?;
for (step, value) in loss_values {
println!("Step {}: loss = {:.4}", step, value);
}
```
### Run Status
Runs progress through states:
| `Running` | Run is currently in progress |
| `Completed` | Run finished successfully |
| `Failed` | Run encountered an error |
| `Killed` | Run was manually terminated |
### Context Manager Support
The Python API supports context managers for automatic run completion:
```python
# Automatic completion on success
with exp.start_run() as run:
run.log_param("lr", 0.001)
# ... training code ...
# Run automatically marked as "completed"
# Automatic failure on exception
with exp.start_run() as run:
run.log_param("lr", 0.001)
raise ValueError("Training failed!")
# Run automatically marked as "failed"
```
### Querying and Filtering
```python
# Filter by status
completed = exp.query(filter={"status": "completed"})
# Filter by tags
baseline_runs = exp.query(filter={"tags": ["baseline"]})
# Filter by parameter value
lr_runs = exp.query(filter={"learning_rate": "0.001"})
# Sort by metric (descending for best first)
best_runs = exp.query(sort_by="accuracy", ascending=False)
# Combine filters
best_baseline = exp.query(
filter={"status": "completed", "tags": ["baseline"]},
sort_by="accuracy",
ascending=False
)
```
## Data Types
Syna supports six atomic data types:
| Null | `Atom::Null` | N/A | Absence of value |
| Float | `Atom::Float(f64)` | `SYNA_put_float` | 64-bit floating point |
| Int | `Atom::Int(i64)` | `SYNA_put_int` | 64-bit signed integer |
| Text | `Atom::Text(String)` | `SYNA_put_text` | UTF-8 string |
| Bytes | `Atom::Bytes(Vec<u8>)` | `SYNA_put_bytes` | Raw byte array |
| Vector | `Atom::Vector(Vec<f32>, u16)` | `SYNA_put_vector` | Embedding vector (64-8192 dims) |
## Configuration
```rust
use synadb::{synadb, DbConfig};
let config = DbConfig {
enable_compression: true, // LZ4 compression for large values
enable_delta: true, // Delta encoding for float sequences
sync_on_write: true, // fsync after each write (safer but slower)
};
let db = synadb::with_config("my_data.db", config)?;
```
## Error Codes (FFI)
| 1 | `ERR_SUCCESS` | Operation successful |
| 0 | `ERR_GENERIC` | Generic error |
| -1 | `ERR_DB_NOT_FOUND` | Database not in registry |
| -2 | `ERR_INVALID_PATH` | Invalid path or UTF-8 |
| -3 | `ERR_IO` | I/O error |
| -4 | `ERR_SERIALIZATION` | Serialization error |
| -5 | `ERR_KEY_NOT_FOUND` | Key not found |
| -6 | `ERR_TYPE_MISMATCH` | Type mismatch on read |
| -100 | `ERR_INTERNAL_PANIC` | Internal panic |
## Benchmark Results
SynaDB is designed for high-performance AI/ML workloads. Here are benchmark results from our test suite:
### System Configuration
- **CPU**: Intel Core i9-14900KF (32 cores)
- **RAM**: 64 GB
- **OS**: Windows 11
- **Benchmark**: 10,000 iterations per test
### Write Performance
| 64 B | **139,346 ops/sec** | 5.6 μs | 16.9 μs | 1.06 MB |
| 1 KB | 98,269 ops/sec | 6.8 μs | 62.7 μs | 11.1 MB |
| 64 KB | 11,475 ops/sec | 71.9 μs | 238.4 μs | 688 MB |
### Read Performance
| 1 | **134,725 ops/sec** | 6.2 μs | 18.0 μs |
| 4 | 106,489 ops/sec | 6.9 μs | 28.2 μs |
| 8 | 95,341 ops/sec | 8.1 μs | 39.3 μs |
### Mixed Workloads (YCSB)
| YCSB-A | 50% read, 50% update | 97,405 ops/sec | 7.3 μs |
| YCSB-B | 95% read, 5% update | 111,487 ops/sec | 8.5 μs |
| YCSB-C | 100% read | **121,197 ops/sec** | 3.2 μs |
### Performance Characteristics
SynaDB uses relative benchmarks to ensure claims are hardware-independent:
| Read vs Write | Read faster than write | In-memory index lookup |
| MmapVectorStore vs VectorStore | 7x faster batch insert | Memory-mapped I/O |
| GWI vs HNSW | Faster index build | O(N) vs O(N log N) |
| HNSW vs Brute Force | Faster search | O(log N) vs O(N) |
Run benchmarks on your own hardware:
- **Google Colab**: [SynaDB Playground Notebook](https://colab.research.google.com/github/gtava5813/SynaDB/blob/main/demos/notebooks/SynaDB_Playground.ipynb)
- **PythonAnywhere**: [Live Demo](https://gtava5813.pythonanywhere.com/)
### FAISS vs HNSW Comparison
SynaDB includes benchmarks comparing its native HNSW index against FAISS:
```bash
cd benchmarks
# Quick comparison (10K vectors)
cargo run --release -- faiss --quick
# Full comparison (100K and 1M vectors)
cargo run --release -- faiss --full
# With FAISS enabled (requires FAISS library installed)
cargo run --release --features faiss -- faiss --quick
```
| HNSW | 50K | 0.5ms | 80 MB | 95% |
| FAISS-Flat | 100K | 10ms | 60 MB | 100% |
| FAISS-IVF | 80K | 1ms | 65 MB | 92% |
### Running Benchmarks
```bash
cd benchmarks
cargo bench
```
See [benchmarks/README.md](benchmarks/README.md) for detailed benchmark configuration.
## Syna Studio
Syna Studio is a web-based UI for exploring and managing SynaDB databases.
### Features
- **Keys Explorer** - Search, filter by type, hex viewer for binary data
- **Model Registry** - View ML models, versions, stages, metadata
- **3D Clusters** - PCA visualization of embedding vectors
- **Statistics** - Treemap, pie charts, dynamic widgets
- **Integrations** - Auto-discover integration scripts
- **Custom Suite** - Compact DB, export JSON, integrity check
### Quick Start
```bash
cd demos/python/synadb
# Launch with test data
python run_ui.py --test
# Launch with HuggingFace embeddings
python run_ui.py --test --use-hf --samples 200
# Open existing database
python run_ui.py path/to/database.db
```
Access the dashboard at `http://localhost:8501`.
See [STUDIO_DOCS.md](demos/python/synadb/STUDIO_DOCS.md) for full documentation.
## Architecture Philosophy
SynaDB uses a **modular architecture** where each component is a specialized class optimized for its specific workload:
| `SynaDB` | Core key-value store with history | Time-series, config, metadata |
| `VectorStore` | Embedding storage with HNSW search | RAG, semantic search |
| `MmapVectorStore` | High-throughput vector ingestion | Bulk embedding pipelines |
| `GravityWellIndex` | Fast-build vector index | Streaming/real-time data |
| `CascadeIndex` | Hybrid three-stage index | Balanced build/search (Experimental) |
| `SparseVectorStore` | Inverted index for sparse vectors | Lexical search (SPLADE, BM25) |
| `TensorEngine` | Batch tensor operations | ML data loading |
| `ModelRegistry` | Model versioning with checksums | Model management |
| `Experiment` | Experiment tracking | MLOps workflows |
**Why modular?** This design follows the Unix philosophy of "do one thing well":
- **Independent usage** - Use only what you need
- **Isolation** - Each component manages its own storage file
- **Performance** - Optimized for specific workloads
- **Composability** - Combine components as needed
**Typed API:** SynaDB uses typed methods (`put_float`, `put_int`, `put_text`) rather than a generic `set()` for:
- **Type safety** - Prevents accidental type mismatches
- **Performance** - No runtime type detection overhead
- **FFI compatibility** - Maps directly to C-ABI functions
## Storage Architecture
Syna uses an append-only log structure inspired by the "physics of time" principle:
```
┌─────────────────────────────────────────────────────────────┐
│ Entry 0 │
├──────────────┬──────────────────┬───────────────────────────┤
│ LogHeader │ Key (UTF-8) │ Value (bincode) │
│ (15 bytes) │ (key_len bytes) │ (val_len bytes) │
├──────────────┴──────────────────┴───────────────────────────┤
│ Entry 1 ... │
└─────────────────────────────────────────────────────────────┘
```
- **Writes**: Always append to end of file (sequential I/O)
- **Reads**: Use in-memory index for O(1) key lookup
- **Recovery**: Scan file on open to rebuild index
- **Compaction**: Rewrite file with only latest values
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## License
SynaDB License - Free for personal use and companies under $10M ARR / 1M MAUs. See [LICENSE](LICENSE) for details.