SynaDB
An AI-native embedded database.
An embedded, log-structured, columnar-mapped database engine written in Rust. Syna combines the embedded simplicity of SQLite, the columnar analytical speed of DuckDB, and the schema flexibility of MongoDB.
Features
- Append-only log structure - Fast sequential writes, immutable history
- Schema-free - Store heterogeneous data types without migrations
- AI/ML optimized - Extract time-series data as contiguous tensors for PyTorch/TensorFlow
- Vector Store - Native embedding storage with HNSW index for similarity search
- MmapVectorStore - Ultra-high-throughput vector storage (490K vectors/sec)
- HNSW Index - O(log N) approximate nearest neighbor search
- Gravity Well Index - Novel O(N) build time index (168x faster than HNSW)
- Cascade Index - Three-stage hybrid index (LSH + bucket tree + graph) (Experimental)
- Tensor Engine - Batch tensor operations with chunked storage
- Model Registry - Version models with SHA-256 checksum verification
- Experiment Tracking - Log parameters, metrics, and artifacts
- LLM Integrations - LangChain, LlamaIndex, Haystack support
- ML Integrations - PyTorch Dataset/DataLoader, TensorFlow tf.data
- CLI Tool - Command-line database inspection and management
- Studio Web UI - Visual database explorer with 3D embedding clusters
- GPU Direct - CUDA tensor loading (optional feature)
- FAISS Integration - Billion-scale vector search (optional feature)
- C-ABI interface - Use from Python, Node.js, C++, or any FFI-capable language
- Delta & LZ4 compression - Minimize storage for time-series data
- Crash recovery - Automatic index rebuild on open
- Thread-safe - Concurrent read/write access with mutex-protected writes
Installation
Rust
[]
= "1.0.6"
Python
See Python Package for full Python documentation.
Building from Source
# Clone the repository
# Build release version
# Run tests
The compiled library will be at:
- Linux:
target/release/libsynadb.so - macOS:
target/release/libsynadb.dylib - Windows:
target/release/synadb.dll
Quick Start
Rust Usage
use ;
Python Usage (ctypes)
# Load the library
= # or .dylib/.dll
# Define function signatures
=
=
=
=
=
=
=
=
=
=
=
= None
=
=
# Usage
= b
# Open database
=
assert == 1, f
# Write float values
# Read latest value
=
=
# Get history as numpy-compatible array
=
=
# Convert to Python list (or use numpy.ctypeslib for zero-copy)
=
# Free the tensor memory
# Close database
C/C++ Usage
int
Compile with:
Vector Store
Store and search embeddings for RAG applications:
# Create store with 768 dimensions (BERT-sized)
=
# Insert embeddings
=
=
# Search for similar vectors
=
=
Distance Metrics
The VectorStore supports three distance metrics:
| Metric | Description | Use Case |
|---|---|---|
cosine (default) |
Cosine distance (1 - cosine_similarity) | Text embeddings, normalized vectors |
euclidean |
Euclidean (L2) distance | Image embeddings, spatial data |
dot_product |
Negative dot product | Maximum inner product search |
# Use euclidean distance
=
# Use dot product
=
Supported Dimensions
Vector dimensions from 64 to 4096 are supported, covering all common embedding models:
| Model | Dimensions |
|---|---|
| MiniLM | 384 |
| BERT base | 768 |
| BERT large | 1024 |
| OpenAI ada-002 | 1536 |
| OpenAI text-embedding-3-large | 3072 |
HNSW Index
For large-scale vector search (>10,000 vectors), SynaDB uses HNSW (Hierarchical Navigable Small World) indexing for approximate nearest neighbor search with O(log N) complexity.
# HNSW is automatically enabled when vector count exceeds threshold
=
# Insert many vectors - HNSW index builds automatically
=
# Search is now O(log N) instead of O(N)
= # <10ms for 1M vectors
HNSW Configuration (Rust API):
use ;
use DistanceMetric;
// Custom HNSW configuration
let config = with_m // More connections = better recall
.ef_construction // Higher = better index quality
.ef_search; // Higher = better search recall
let mut index = new;
| Parameter | Default | Description |
|---|---|---|
m |
16 | Max connections per node (8-64 typical) |
m_max |
32 | Max connections at higher layers (2×M) |
ef_construction |
200 | Build quality (100-500 typical) |
ef_search |
100 | Search quality (50-500 typical) |
MmapVectorStore
For ultra-high-throughput vector ingestion (490K vectors/sec), use MmapVectorStore:
# Create store with pre-allocated capacity
=
# Batch insert - 7x faster than VectorStore
=
=
# 490K vectors/sec
# Build HNSW index
# Search
=
# Checkpoint to persist (not per-write like VectorStore)
| Aspect | VectorStore | MmapVectorStore |
|---|---|---|
| Write speed | ~67K/sec | ~490K/sec |
| Durability | Per-write | Checkpoint |
| Capacity | Dynamic | Pre-allocated |
Gravity Well Index (GWI)
For scenarios where index build time is critical, GWI provides O(N) build time (168x faster than HNSW at 50K vectors):
# Create index
=
# Initialize with sample vectors (required)
=
# Insert vectors - O(N) total build time
=
=
# Search with tunable recall (nprobe=50 gives 98% recall)
=
GWI vs HNSW Build Time:
| Dataset | GWI | HNSW | Speedup |
|---|---|---|---|
| 10K × 768 | 2.1s | 18.4s | 8.9x |
| 50K × 768 | 3.0s | 504s | 168x |
When to use which:
- VectorStore: General use, good all-around
- MmapVectorStore: High-throughput ingestion, large datasets
- GWI: Build time critical, streaming/real-time data
- Cascade: Balanced build/search, tunable recall
- FAISS: Billion-scale, GPU acceleration
Cascade Index (Experimental)
For balanced performance with tunable recall/latency trade-off:
# Create with preset configuration
=
# Or custom configuration
=
# Insert vectors - no initialization required
=
=
# Search
=
# Save and close
Configuration Presets:
| Preset | Use Case | Build Speed | Search Speed | Recall |
|---|---|---|---|---|
small |
<100K vectors | Fast | Fast | 95%+ |
large |
1M+ vectors | Medium | Fast | 95%+ |
high_recall |
Accuracy critical | Slow | Medium | 99%+ |
fast_search |
Latency critical | Fast | Very Fast | 90%+ |
Architecture:
- LSH Layer - Hyperplane-based locality-sensitive hashing with multi-probe
- Bucket Tree - Adaptive splitting when buckets exceed threshold
- Sparse Graph - Local neighbor connections for search refinement
Tensor Engine
The TensorEngine provides efficient batch operations for ML data loading.
Key Semantics: When storing tensors, the first parameter is a key prefix, not a full key. Elements are stored with auto-generated keys like {prefix}0000, {prefix}0001, etc. When loading, use glob patterns like {prefix}* to retrieve all elements.
# Create tensor engine
=
# Store training data (prefix "train/" generates keys: train/0000, train/0001, ...)
=
# Note: prefix ends with /
# Load as tensor (pattern matching with glob)
=
# Load with specific shape
=
# For large tensors, use chunked storage (more efficient)
=
# Stream in batches for training
PyTorch Integration
# Load directly as PyTorch tensor
=
# Or use with DataLoader
=
=
=
=
TensorFlow Integration
# Load directly as TensorFlow tensor
=
# Use with tf.data
=
Rust API
use ;
use ;
// Create database and populate with data
let mut db = new?;
for i in 0..100
// Create tensor engine
let mut engine = new;
// Load all sensor data as a tensor
let = engine.get_tensor?;
assert_eq!;
// Store tensor with auto-generated keys
let values: = vec!
.iter
.flat_map
.collect;
let count = engine.put_tensor?;
Supported Data Types
| DType | Size | Description |
|---|---|---|
Float32 |
4 bytes | 32-bit floating point |
Float64 |
8 bytes | 64-bit floating point |
Int32 |
4 bytes | 32-bit signed integer |
Int64 |
8 bytes | 64-bit signed integer |
Model Registry
Store and version ML models with automatic checksum verification:
Python Usage
# Create a model registry
=
# Save a model with metadata
=
=
=
# Load the latest version (with automatic checksum verification)
, =
# Load a specific version
, =
# List all versions
=
# Promote to production
# Get the production model
=
Rust Usage
use ;
use HashMap;
// Create a model registry
let mut registry = new?;
// Save a model with metadata
let model_data = vec!; // Your model bytes
let mut metadata = new;
metadata.insert;
metadata.insert;
let version = registry.save_model?;
println!;
// Load the latest version (with automatic checksum verification)
let = registry.load_model?;
println!;
// Load a specific version
let = registry.load_model?;
// List all versions
let versions = registry.list_versions?;
for v in versions
// Promote to production
registry.set_stage?;
// Get the production model
if let Some = registry.get_production?
Model Stages
Models progress through deployment stages:
| Stage | Description |
|---|---|
Development |
Initial stage for new models (default) |
Staging |
Models being tested before production |
Production |
Models actively serving predictions |
Archived |
Retired models kept for reference |
Checksum Verification
Every model is stored with a SHA-256 checksum. When loading, the checksum is automatically verified to detect corruption:
# If the model data is corrupted, load_model raises an error
, =
Experiment Tracking
Track ML experiments with parameters, metrics, and artifacts:
Python Usage
# Create an experiment
=
# Start a run with tags
# Log hyperparameters
# Log metrics during training
= 1.0 /
= 0.5 + 0.005 *
# Log artifacts
# Query runs
=
=
# Get metrics as numpy array for plotting
=
# Compare runs
=
Rust Usage
use ;
// Create an experiment tracker
let mut tracker = new?;
// Start a run with tags
let run_id = tracker.start_run?;
// Log hyperparameters
tracker.log_param?;
tracker.log_param?;
tracker.log_param?;
// Log metrics during training
for epoch in 0..100
// Log artifacts
let model_data = vec!; // Your model bytes
tracker.log_artifact?;
// End the run
tracker.end_run?;
// Query runs
let runs = tracker.list_runs?;
for run in runs
// Get metrics
let loss_values = tracker.get_metric?;
for in loss_values
Run Status
Runs progress through states:
| Status | Description |
|---|---|
Running |
Run is currently in progress |
Completed |
Run finished successfully |
Failed |
Run encountered an error |
Killed |
Run was manually terminated |
Context Manager Support
The Python API supports context managers for automatic run completion:
# Automatic completion on success
# ... training code ...
# Run automatically marked as "completed"
# Automatic failure on exception
# Run automatically marked as "failed"
Querying and Filtering
# Filter by status
=
# Filter by tags
=
# Filter by parameter value
=
# Sort by metric (descending for best first)
=
# Combine filters
=
Data Types
Syna supports six atomic data types:
| Type | Rust | C/FFI | Description |
|---|---|---|---|
| Null | Atom::Null |
N/A | Absence of value |
| Float | Atom::Float(f64) |
SYNA_put_float |
64-bit floating point |
| Int | Atom::Int(i64) |
SYNA_put_int |
64-bit signed integer |
| Text | Atom::Text(String) |
SYNA_put_text |
UTF-8 string |
| Bytes | Atom::Bytes(Vec<u8>) |
SYNA_put_bytes |
Raw byte array |
| Vector | Atom::Vector(Vec<f32>, u16) |
SYNA_put_vector |
Embedding vector (64-4096 dims) |
Configuration
use ;
let config = DbConfig ;
let db = with_config?;
Error Codes (FFI)
| Code | Constant | Meaning |
|---|---|---|
| 1 | ERR_SUCCESS |
Operation successful |
| 0 | ERR_GENERIC |
Generic error |
| -1 | ERR_DB_NOT_FOUND |
Database not in registry |
| -2 | ERR_INVALID_PATH |
Invalid path or UTF-8 |
| -3 | ERR_IO |
I/O error |
| -4 | ERR_SERIALIZATION |
Serialization error |
| -5 | ERR_KEY_NOT_FOUND |
Key not found |
| -6 | ERR_TYPE_MISMATCH |
Type mismatch on read |
| -100 | ERR_INTERNAL_PANIC |
Internal panic |
Benchmark Results
SynaDB is designed for high-performance AI/ML workloads. Here are benchmark results from our test suite:
System Configuration
- CPU: Intel Core i9-14900KF (32 cores)
- RAM: 64 GB
- OS: Windows 11
- Benchmark: 10,000 iterations per test
Write Performance
| Value Size | Throughput | p50 Latency | p99 Latency | Storage |
|---|---|---|---|---|
| 64 B | 139,346 ops/sec | 5.6 μs | 16.9 μs | 1.06 MB |
| 1 KB | 98,269 ops/sec | 6.8 μs | 62.7 μs | 11.1 MB |
| 64 KB | 11,475 ops/sec | 71.9 μs | 238.4 μs | 688 MB |
Read Performance
| Threads | Throughput | p50 Latency | p99 Latency |
|---|---|---|---|
| 1 | 134,725 ops/sec | 6.2 μs | 18.0 μs |
| 4 | 106,489 ops/sec | 6.9 μs | 28.2 μs |
| 8 | 95,341 ops/sec | 8.1 μs | 39.3 μs |
Mixed Workloads (YCSB)
| Workload | Description | Throughput | p50 Latency |
|---|---|---|---|
| YCSB-A | 50% read, 50% update | 97,405 ops/sec | 7.3 μs |
| YCSB-B | 95% read, 5% update | 111,487 ops/sec | 8.5 μs |
| YCSB-C | 100% read | 121,197 ops/sec | 3.2 μs |
Performance Targets
| Operation | Target | Achieved |
|---|---|---|
| Write throughput | 100K+ ops/sec | ✅ 139K ops/sec |
| Read throughput | 100K+ ops/sec | ✅ 135K ops/sec |
| Read latency (p50) | <10 μs | ✅ 3.2-8.1 μs |
| Vector search (1M) | <10 ms | ✅ O(log N) with HNSW |
FAISS vs HNSW Comparison
SynaDB includes benchmarks comparing its native HNSW index against FAISS:
# Quick comparison (10K vectors)
# Full comparison (100K and 1M vectors)
# With FAISS enabled (requires FAISS library installed)
| Index | Insert (v/s) | Search p50 | Memory | Recall@10 |
|---|---|---|---|---|
| HNSW | 50K | 0.5ms | 80 MB | 95% |
| FAISS-Flat | 100K | 10ms | 60 MB | 100% |
| FAISS-IVF | 80K | 1ms | 65 MB | 92% |
Running Benchmarks
See benchmarks/README.md for detailed benchmark configuration.
Syna Studio
Syna Studio is a web-based UI for exploring and managing SynaDB databases.
Features
- Keys Explorer - Search, filter by type, hex viewer for binary data
- Model Registry - View ML models, versions, stages, metadata
- 3D Clusters - PCA visualization of embedding vectors
- Statistics - Treemap, pie charts, dynamic widgets
- Integrations - Auto-discover integration scripts
- Custom Suite - Compact DB, export JSON, integrity check
Quick Start
# Launch with test data
# Launch with HuggingFace embeddings
# Open existing database
Access the dashboard at http://localhost:8501.
See STUDIO_DOCS.md for full documentation.
Architecture Philosophy
SynaDB uses a modular architecture where each component is a specialized class optimized for its specific workload:
| Component | Purpose | Use Case |
|---|---|---|
SynaDB |
Core key-value store with history | Time-series, config, metadata |
VectorStore |
Embedding storage with HNSW search | RAG, semantic search |
MmapVectorStore |
High-throughput vector ingestion | Bulk embedding pipelines |
GravityWellIndex |
Fast-build vector index | Streaming/real-time data |
CascadeIndex |
Hybrid three-stage index | Balanced build/search (Experimental) |
TensorEngine |
Batch tensor operations | ML data loading |
ModelRegistry |
Model versioning with checksums | Model management |
Experiment |
Experiment tracking | MLOps workflows |
Why modular? This design follows the Unix philosophy of "do one thing well":
- Independent usage - Use only what you need
- Isolation - Each component manages its own storage file
- Performance - Optimized for specific workloads
- Composability - Combine components as needed
Typed API: SynaDB uses typed methods (put_float, put_int, put_text) rather than a generic set() for:
- Type safety - Prevents accidental type mismatches
- Performance - No runtime type detection overhead
- FFI compatibility - Maps directly to C-ABI functions
Storage Architecture
Syna uses an append-only log structure inspired by the "physics of time" principle:
┌─────────────────────────────────────────────────────────────┐
│ Entry 0 │
├──────────────┬──────────────────┬───────────────────────────┤
│ LogHeader │ Key (UTF-8) │ Value (bincode) │
│ (15 bytes) │ (key_len bytes) │ (val_len bytes) │
├──────────────┴──────────────────┴───────────────────────────┤
│ Entry 1 ... │
└─────────────────────────────────────────────────────────────┘
- Writes: Always append to end of file (sequential I/O)
- Reads: Use in-memory index for O(1) key lookup
- Recovery: Scan file on open to rebuild index
- Compaction: Rewrite file with only latest values
Contributing
See CONTRIBUTING.md for guidelines.
License
SynaDB License - Free for personal use and companies under $10M ARR / 1M MAUs. See LICENSE for details.