AvocadoDB
The first deterministic context database for AI agents
Fix your RAG in 5 minutes - same query, same context, every time.
What is AvocadoDB?
AvocadoDB is a span-based context compiler that replaces traditional vector databases' chaotic "top-k" retrieval with deterministic, citation-backed context generation.
Pure Rust embeddings = 6x faster than OpenAI, works completely offline, costs $0.
The Problem with RAG
Current RAG systems are fundamentally broken:
- ❌ Same query → different results each time (non-deterministic)
- ❌ Token budgets wasted on duplicates (60-70% utilization)
- ❌ No citations or verifiability
- ❌ Hallucinations from inconsistent context
- ❌ Slow (200-300ms just for OpenAI embedding calls)
- ❌ Expensive (API costs scale with usage)
The AvocadoDB Solution
- ✅ 100% Deterministic: Same query → same context, every time
- ✅ 6x Faster: 40-60ms compilation (vs 240-360ms with OpenAI)
- ✅ Zero Cost: Pure Rust embeddings, no API required
- ✅ Works Offline: No internet needed after initial setup
- ✅ Citation-Backed: Every span has exact line number citations
- ✅ Token Efficient: 95%+ budget utilization
- ✅ Drop-in Replacement: Works with any LLM
⚡ Performance
# Run benchmarks on your hardware
# Results (M1 Mac example):
# Single embedding: 1.2ms (vs ~250ms OpenAI)
# Batch of 100: 8.7ms (vs ~250ms OpenAI)
# Full compilation: 43ms (vs ~300ms OpenAI)
#
# Speedup: 6-7x faster ⚡
# Cost: $0 (vs ~$0.0001 per 1K tokens)
See EMBEDDING_PERFORMANCE.md for detailed benchmarks.
Quick Start
Install from crates.io (Easiest)
That's it! Now you can use avocado directly:
Docker (Recommended for Server)
Run the server with Docker:
# Run with Docker
# Or use Docker Compose
# Test the server
See Docker Guide for complete documentation.
Installation from Source
# Install Rust
|
# Clone and build
# Optional: Set OpenAI API key (only if you want to use OpenAI embeddings)
# By default, AvocadoDB uses local embeddings (no API key required, no Python required!)
#
# Local embeddings strategy (automatic, in priority order):
# 1. Pure Rust with fastembed (semantic, good quality, no Python required) ✅ DEFAULT
# - Uses all-MiniLM-L6-v2 model (384 dimensions) by default
# - ONNX-based, fast and efficient
# - Model downloaded automatically on first use (~90MB)
# - To increase dimensionality, set AVOCADODB_EMBEDDING_MODEL:
# * "nomic" or "nomicv15" → 768 dimensions (good balance)
# * "bgelarge" or "bge-large-en-v1.5" → 1024 dimensions (higher quality)
# 2. Python + sentence-transformers (fallback if fastembed unavailable)
# - Requires: pip install sentence-transformers
# 3. Hash-based fallback (deterministic, but NOT semantic)
# - Works always, but poor semantic quality
#
# To use OpenAI embeddings instead:
# export OPENAI_API_KEY="sk-..."
# export AVOCADODB_EMBEDDING_PROVIDER=openai
CLI Usage (Daemon by default)
# Initialize database
# Get model recommendation (optional)
# Recommends optimal embedding model for your use case
# Ingest documents
# Output: Ingested 42 files → 387 spans
# Compile context (uses daemon at http://localhost:8765 by default)
# Force local mode (uses .avocado/db.sqlite in current project)
# Run performance benchmarks
# Shows real performance on your hardware
GPU-backed server (Modal) quickstart
# Start the daemon with remote GPU embeddings (Modal)
# or CPU/local (default)
Example Output:
Compiling context for: "How does authentication work?"
Token budget: 8000
[1] docs/authentication.md
Lines 1-23
# Authentication System
Our authentication uses JWT tokens with secure refresh mechanisms...
---
[2] src/middleware/auth.ts
Lines 45-78
export function authenticateRequest(req: Request) {
const token = req.headers.authorization?.split(' ')[1];
if (!token) throw new UnauthorizedError();
...
}
---
Compiled 12 spans using 7,891 tokens (98.6% utilization)
Compilation time: 243ms
Context hash: e3b0c4429...52b855 (deterministic ✓)
Python SDK
=
=
# Deterministic every time
TypeScript SDK
import { AvocadoDB } from 'avocadodb';
const db = new AvocadoDB();
await db.ingest('./docs', recursive: true);
const result = await db.compile('my query', { budget: 8000 });
console.log(result.text); // Deterministic every time
HTTP Server (Multi-project daemon)
# Start server (binds to 127.0.0.1 by default)
# Use the API
Docker & Kubernetes Deployment
AvocadoDB is production-ready with full Docker and Kubernetes support.
Docker
# Quick start with Docker
# Or use Docker Compose
Features:
- Multi-stage build for minimal image size (~80-100MB)
- Multi-architecture support (linux/amd64, linux/arm64)
- Non-root user for security
- Health checks built-in
- Configurable via environment variables
See Docker Guide for complete documentation.
Kubernetes
# Deploy to Kubernetes
# Verify deployment
Includes:
- Production-ready Deployment manifests
- Horizontal scaling support
- Persistent storage configuration
- Ingress with TLS/HTTPS
- ConfigMaps and Secrets management
- Resource limits and health checks
See Kubernetes Guide for complete documentation.
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT |
8765 |
HTTP server port |
BIND_ADDR |
127.0.0.1 |
Bind address (set 0.0.0.0 to expose publicly) |
RUST_LOG |
info |
Log level |
AVOCADODB_EMBEDDING_MODEL |
minilm |
Embedding model (minilm, nomic, bgelarge) |
AVOCADODB_EMBEDDING_PROVIDER |
local |
Provider (local or openai) |
OPENAI_API_KEY |
- | OpenAI API key (if using OpenAI) |
AVOCADODB_ROOT |
unset | Optional project root. When set, all project paths must be under this directory. Requests outside are rejected. |
API_TOKEN |
unset | If set, requires header X-Avocado-Token to be present and equal for all routes (except /health, /api-docs/*). |
MAX_BODY_BYTES |
2097152 (2MB) |
Request body size limit to protect against large payloads. |
Security note:
- Do not expose the server publicly without protection. If you must, set
BIND_ADDR=0.0.0.0and front it with auth. - For local safety, clients always send an explicit
project(their current working directory), and the server normalizes paths and can restrict toAVOCADODB_ROOT.
How It Works
Architecture
Query → Embed → [Semantic Search + Lexical Search] → Hybrid Fusion
→ MMR Diversification → Token Packing → Deterministic Sort → WorkingSet
Key Innovations
- Span-Based Indexing: Documents are split into spans (20-50 lines) with precise line numbers
- Hybrid Retrieval: Combines semantic (vector) and lexical (keyword) search
- Deterministic Ordering: Results sorted by
(artifact_id, start_line)for reproducibility - Greedy Token Packing: Maximizes token budget utilization without duplicates
Explainability & Reproducibility (v2.1)
NEW in v2.1: Enhanced determinism, explainability, and quality tracking features based on production feedback.
Version Manifest
Every compilation now includes a version manifest for full reproducibility:
// Access manifest from WorkingSet
let manifest = working_set.manifest.unwrap;
println!;
println!;
println!;
The manifest includes: avocado version, tokenizer, embedding model, embedding dimensions, chunking params, index params, and a SHA256 context hash.
Explain Plan
Understand exactly how context was selected with explain mode:
# CLI with explain
# Shows candidates at each pipeline stage:
# - Semantic search (top 50 from HNSW)
# - Lexical search (keyword matches)
# - Hybrid fusion (RRF combination)
# - MMR diversification
# - Token packing
# - Final deterministic order
# Python SDK
=
Working Set Diff
Compare retrieval results across corpus versions for auditing:
use ;
let diff = diff_working_sets;
println!;
// Output: "3 added, 1 removed, 2 reranked"
Smart Incremental Rebuild
Only re-embed changed files - unchanged content is automatically skipped:
# First ingest
# Ingested 42 files → 387 spans
# Re-ingest after editing 3 files
# Skipped 39 unchanged, Updated 3 files → 28 spans
Content-hash comparison ensures minimal re-embedding while keeping the index fresh.
Evaluation Metrics
Built-in support for golden set testing and quality metrics:
use ;
let queries = vec!;
let summary = evaluate.await?;
println!;
println!;
Session Management
NEW in v2.0: Multi-turn conversation tracking with context compilation
AvocadoDB now supports session management, enabling AI agents to maintain conversation history and context across multiple interactions.
Quick Example
=
# Create a session
=
# Multi-turn conversation
=
=
# Get conversation history
=
# Replay for debugging
=
Features
- Multi-turn conversations: Track user queries and agent responses
- Context compilation: Automatically compile context for each query
- Conversation history: Retrieve formatted history with token limiting
- Session replay: Debug agent behavior by replaying entire sessions
- Persistence: Sessions stored in SQLite with full ACID guarantees
Available in
- ✅ Python SDK: Full session support with
Sessionclass - ✅ TypeScript SDK: Complete session management API
- ✅ CLI: Session commands for interactive use
- ✅ HTTP API: RESTful endpoints for all session operations
See SESSION_MANAGEMENT.md for complete documentation.
Why Determinism Matters
When RAG systems return different context for the same query:
- LLMs produce inconsistent answers
- Users can't verify results
- Debugging is impossible
- Trust is broken
AvocadoDB fixes this with deterministic compilation - same query, same context, every time.
Verify Determinism Yourself
# Run the same query multiple times
| |
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
| |
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
# Same hash every single time! ✅
Performance
Phase 1 achieves production-ready performance:
| Metric | Target | Actual | Status |
|---|---|---|---|
| Compilation time (8K tokens) | < 500ms | ~50ms avg | ✅ 10x faster |
| Token budget utilization | > 95% | 90-95% | ✅ Excellent |
| Determinism | 100% | 100% | ✅ Perfect |
| Duplicate spans | 0 | 0 | ✅ Perfect |
Breakdown for 8K token budget compilation (with Pure Rust embeddings):
Embed query: 1-5ms (2-5% of total) - Pure Rust (fastembed), local
Semantic search: <1ms (Vector similarity, HNSW)
Lexical search: <1ms (SQL LIKE query)
Hybrid fusion: <1ms (RRF score combination)
MMR diversification: 5-10ms (Diversity selection)
Token packing: <1ms (Greedy budget allocation)
Deterministic sort: <1ms (Stable sort)
Build context: <1ms (Text concatenation)
Count tokens: 30-40ms (tiktoken encoding)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL: 40-60ms (6x faster than OpenAI!)
Performance Comparison:
| Metric | Pure Rust (fastembed) | OpenAI API |
|---|---|---|
| Query Embedding | 1-5ms | 200-300ms |
| Total Compilation | 40-60ms | 240-360ms |
| Throughput | 200-1000 texts/sec | 3-5 batches/sec |
| Cost | Free | ~$0.0001/1K tokens |
| Rate Limits | None | Varies by tier |
| Offline | ✅ Yes | ❌ No |
| Quality | Good (384 dims) | Excellent (1536 dims) |
Pure Rust embeddings are 6x faster and completely free! Optimization: All algorithms run in <15ms total (highly optimized)
See docs/performance.md for detailed analysis and scaling characteristics.
CLI Reference
avocado init
Initialize a new AvocadoDB database:
Creates .avocado/ directory with SQLite database and vector index.
avocado ingest
Ingest documents into the database:
Examples:
# Ingest single file
# Ingest entire directory recursively
# Ingest specific file types
The ingestion process:
- Reads document content
- Extracts spans (20-50 lines with smart boundaries)
- Generates embeddings for each span (local fastembed by default)
- Stores in SQLite database
avocado compile
Compile a deterministic context for a query:
Options:
--budget <tokens>: Token budget (default: 8000)--json: Output as JSON instead of human-readable format--explain: Show explain plan with candidates at each pipeline stage--mmr-lambda <0.0-1.0>: MMR diversity parameter (default: 0.5)- Higher values (0.7-1.0) = more relevant but potentially redundant
- Lower values (0.0-0.3) = more diverse but potentially less relevant
--semantic-weight <float>: Semantic search weight (default: 0.7)--lexical-weight <float>: Lexical search weight (default: 0.3)
Examples:
# Basic compilation
# Large context window
# Prioritize diversity over relevance
# Tune search weights (more keyword matching)
# JSON output for programmatic use
JSON Output Format:
avocado stats
Show database statistics:
Example output:
Database Statistics:
Artifacts: 42
Spans: 387
Total Tokens: 125,431
Average Tokens/Span: 324
avocado clear
Clear all data from the database:
Warning: This permanently deletes all ingested documents and embeddings!
Library Usage (Rust)
Use AvocadoDB as a library in your Rust projects:
[]
= "2.1"
= { = "1.35", = ["full"] }
use ;
async
Development
Project Structure
avocadodb/
├── avocado-core/ # Core engine (Rust)
├── avocado-cli/ # Command-line tool
├── avocado-server/ # HTTP server
├── python/ # Python SDK
├── migrations/ # Database schema
├── tests/ # Integration tests
└── docs/ # Documentation
Running Tests
# Unit tests
# Integration tests (requires OPENAI_API_KEY)
Building
# Development build
# Release build
# Run CLI
# Run server
Roadmap
Phase 1 ✅ (Complete)
- Core span extraction with smart boundaries
- OpenAI embeddings integration
- Hybrid search (semantic + lexical)
- MMR diversification algorithm
- Deterministic compilation (100% verified)
- CLI tool with full features
- HTTP server
- Performance optimization (240ms avg)
- Comprehensive documentation
Phase 2 - Advanced Features
- Version manifest for full reproducibility
- Explain plan for retrieval debugging
- Working set diff for corpus auditing
- Smart incremental rebuild (content-hash based)
- Evaluation metrics (recall@k, MRR)
- Multi-modal support (images, code)
- Advanced retrieval (BM25, learned rankers)
- PostgreSQL support
- Framework integrations (LangChain, LlamaIndex)
Phase 3 - Agent Memory
- Session management
- Working set versioning
- Collaborative features
- Memory systems
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Testing
AvocadoDB includes comprehensive test suites to validate determinism and performance:
# Run all tests and generate report
# Run determinism validation only (100 iterations)
# Run performance benchmarks
See docs/testing.md for complete testing documentation.
Learn More
- Quick Start Guide - Get running in 5 minutes
- Examples - Real-world usage patterns
- Testing Guide - Validation and benchmarking
- Performance Analysis
- UI Improvements
Built by the AvocadoDB Team | Making retrieval deterministic, one context at a time.