avocado-core 2.1.0

# AvocadoDB

**The first deterministic context database for AI agents**

Fix your RAG in 5 minutes - same query, same context, every time.

[![Build Status](https://img.shields.io/github/actions/workflow/status/avocadodb/avocadodb/ci.yml?branch=master)](https://github.com/avocadodb/avocadodb/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Crates.io](https://img.shields.io/crates/v/avocado-core.svg)](https://crates.io/crates/avocado-core)
[![Docker Hub](https://img.shields.io/docker/pulls/avocadodb/avocadodb)](https://hub.docker.com/r/avocadodb/avocadodb)
[![GitHub stars](https://img.shields.io/github/stars/avocadodb/avocadodb?style=social)](https://github.com/avocadodb/avocadodb)
[![GitHub issues](https://img.shields.io/github/issues/avocadodb/avocadodb)](https://github.com/avocadodb/avocadodb/issues)
[![Coverage](https://img.shields.io/codecov/c/github/avocadodb/avocadodb)](https://codecov.io/gh/avocadodb/avocadodb)

![Embedding](https://img.shields.io/badge/Embedding-Pure%20Rust%20%E2%9A%A1-green) ![Speed](https://img.shields.io/badge/Speed-6x%20Faster-brightgreen) ![Cost](https://img.shields.io/badge/Cost-%240-blue)

## What is AvocadoDB?

AvocadoDB is a span-based context compiler that replaces traditional vector databases' chaotic "top-k" retrieval with deterministic, citation-backed context generation.

**Pure Rust embeddings = 6x faster than OpenAI, works completely offline, costs $0.**

### The Problem with RAG

Current RAG systems are fundamentally broken:

- ❌ Same query → different results each time (non-deterministic)
- ❌ Token budgets wasted on duplicates (60-70% utilization)
- ❌ No citations or verifiability
- ❌ Hallucinations from inconsistent context
- ❌ Slow (200-300ms just for OpenAI embedding calls)
- ❌ Expensive (API costs scale with usage)

### The AvocadoDB Solution

- ✅ **100% Deterministic**: Same query → same context, every time
- ✅ **6x Faster**: 40-60ms compilation (vs 240-360ms with OpenAI)
- ✅ **Zero Cost**: Pure Rust embeddings, no API required
- ✅ **Works Offline**: No internet needed after initial setup
- ✅ **Citation-Backed**: Every span has exact line number citations
- ✅ **Token Efficient**: 95%+ budget utilization
- ✅ **Drop-in Replacement**: Works with any LLM

## ⚡ Performance

```bash
# Run benchmarks on your hardware
./target/release/avocado benchmark

# Results (M1 Mac example):
# Single embedding: 1.2ms  (vs ~250ms OpenAI)
# Batch of 100:     8.7ms  (vs ~250ms OpenAI)
# Full compilation: 43ms   (vs ~300ms OpenAI)
#
# Speedup: 6-7x faster ⚡
# Cost: $0 (vs ~$0.0001 per 1K tokens)
```

See [EMBEDDING_PERFORMANCE.md](docs/EMBEDDING_PERFORMANCE.md) for detailed benchmarks.

## Quick Start

### Install from crates.io (Easiest)

```bash
cargo install avocado-cli
```

That's it! Now you can use `avocado` directly:

```bash
avocado --version
avocado init
avocado ingest ./docs --recursive
avocado compile "your query"
```

### Docker (Recommended for Server)

Run the server with Docker:

```bash
# Run with Docker
docker run -d \
  -p 8765:8765 \
  -v avocado-data:/data \
  --name avocadodb \
  avocadodb/avocadodb:latest

# Or use Docker Compose
docker-compose up -d

# Test the server
curl http://localhost:8765/health
```

See [Docker Guide](docs/DOCKER.md) for complete documentation.

### Installation from Source

```bash
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and build
git clone https://github.com/avocadodb/avocadodb
cd avocadodb
cargo build --release

# Optional: Set OpenAI API key (only if you want to use OpenAI embeddings)
# By default, AvocadoDB uses local embeddings (no API key required, no Python required!)
#
# Local embeddings strategy (automatic, in priority order):
# 1. Pure Rust with fastembed (semantic, good quality, no Python required) ✅ DEFAULT
#    - Uses all-MiniLM-L6-v2 model (384 dimensions) by default
#    - ONNX-based, fast and efficient
#    - Model downloaded automatically on first use (~90MB)
#    - To increase dimensionality, set AVOCADODB_EMBEDDING_MODEL:
#      * "nomic" or "nomicv15" → 768 dimensions (good balance)
#      * "bgelarge" or "bge-large-en-v1.5" → 1024 dimensions (higher quality)
# 2. Python + sentence-transformers (fallback if fastembed unavailable)
#    - Requires: pip install sentence-transformers
# 3. Hash-based fallback (deterministic, but NOT semantic)
#    - Works always, but poor semantic quality
#
# To use OpenAI embeddings instead:
# export OPENAI_API_KEY="sk-..."
# export AVOCADODB_EMBEDDING_PROVIDER=openai
```

### CLI Usage (Daemon by default)

```bash
# Initialize database
./target/release/avocado init

# Get model recommendation (optional)
./target/release/avocado recommend --corpus-size 5000 --use-case production
# Recommends optimal embedding model for your use case

# Ingest documents
./target/release/avocado ingest ./docs --recursive
# Output: Ingested 42 files → 387 spans

# Compile context (uses daemon at http://localhost:8765 by default)
./target/release/avocado compile "How does authentication work?" --budget 8000
# Force local mode (uses .avocado/db.sqlite in current project)
./target/release/avocado compile "How does authentication work?" --local --budget 8000

# Run performance benchmarks
./target/release/avocado benchmark
# Shows real performance on your hardware
```

#### GPU-backed server (Modal) quickstart
```bash
# Start the daemon with remote GPU embeddings (Modal)
avocado serve --gpu --embed-url https://<your-modal-endpoint>/embed
# or CPU/local (default)
avocado serve
```

**Example Output:**
```
Compiling context for: "How does authentication work?"
Token budget: 8000

[1] docs/authentication.md
Lines 1-23

# Authentication System

Our authentication uses JWT tokens with secure refresh mechanisms...

---

[2] src/middleware/auth.ts
Lines 45-78

export function authenticateRequest(req: Request) {
  const token = req.headers.authorization?.split(' ')[1];
  if (!token) throw new UnauthorizedError();
  ...
}

---

Compiled 12 spans using 7,891 tokens (98.6% utilization)
Compilation time: 243ms
Context hash: e3b0c4429...52b855 (deterministic ✓)
```

### Python SDK

```bash
cd sdks/python
pip install -e .
```

```python
from avocado import AvocadoDB

db = AvocadoDB()
db.ingest("./docs", recursive=True)

result = db.compile("my query", budget=8000)
print(result.text)  # Deterministic every time
```

### TypeScript SDK

```bash
cd sdks/typescript
npm install
npm run build
```

```typescript
import { AvocadoDB } from 'avocadodb';

const db = new AvocadoDB();
await db.ingest('./docs', recursive: true);

const result = await db.compile('my query', { budget: 8000 });
console.log(result.text);  // Deterministic every time
```

### HTTP Server (Multi-project daemon)

```bash
# Start server (binds to 127.0.0.1 by default)
./target/release/avocado-server

# Use the API
curl -X POST http://localhost:8765/compile \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "token_budget": 8000, "project": "'"$PWD"'"}'
```

## Docker & Kubernetes Deployment

AvocadoDB is production-ready with full Docker and Kubernetes support.

### Docker

```bash
# Quick start with Docker
docker run -d -p 8765:8765 -v avocado-data:/data avocadodb/avocadodb:latest

# Or use Docker Compose
docker-compose up -d
```

**Features:**
- Multi-stage build for minimal image size (~80-100MB)
- Multi-architecture support (linux/amd64, linux/arm64)
- Non-root user for security
- Health checks built-in
- Configurable via environment variables

See [Docker Guide](docs/DOCKER.md) for complete documentation.

### Kubernetes

```bash
# Deploy to Kubernetes
kubectl apply -k k8s/

# Verify deployment
kubectl get pods -l app=avocadodb
```

**Includes:**
- Production-ready Deployment manifests
- Horizontal scaling support
- Persistent storage configuration
- Ingress with TLS/HTTPS
- ConfigMaps and Secrets management
- Resource limits and health checks

See [Kubernetes Guide](k8s/README.md) for complete documentation.

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `8765` | HTTP server port |
| `BIND_ADDR` | `127.0.0.1` | Bind address (set `0.0.0.0` to expose publicly) |
| `RUST_LOG` | `info` | Log level |
| `AVOCADODB_EMBEDDING_MODEL` | `minilm` | Embedding model (minilm, nomic, bgelarge) |
| `AVOCADODB_EMBEDDING_PROVIDER` | `local` | Provider (local or openai) |
| `OPENAI_API_KEY` | - | OpenAI API key (if using OpenAI) |
| `AVOCADODB_ROOT` | unset | Optional project root. When set, all `project` paths must be under this directory. Requests outside are rejected. |
| `API_TOKEN` | unset | If set, requires header `X-Avocado-Token` to be present and equal for all routes (except `/health`, `/api-docs/*`). |
| `MAX_BODY_BYTES` | `2097152` (2MB) | Request body size limit to protect against large payloads. |

Security note:
- Do not expose the server publicly without protection. If you must, set `BIND_ADDR=0.0.0.0` and front it with auth.
- For local safety, clients always send an explicit `project` (their current working directory), and the server normalizes paths and can restrict to `AVOCADODB_ROOT`.

## How It Works

### Architecture

```
Query → Embed → [Semantic Search + Lexical Search] → Hybrid Fusion
      → MMR Diversification → Token Packing → Deterministic Sort → WorkingSet
```

### Key Innovations

1. **Span-Based Indexing**: Documents are split into spans (20-50 lines) with precise line numbers
2. **Hybrid Retrieval**: Combines semantic (vector) and lexical (keyword) search
3. **Deterministic Ordering**: Results sorted by `(artifact_id, start_line)` for reproducibility
4. **Greedy Token Packing**: Maximizes token budget utilization without duplicates

## Explainability & Reproducibility (v2.1)

**NEW in v2.1**: Enhanced determinism, explainability, and quality tracking features based on production feedback.

### Version Manifest

Every compilation now includes a version manifest for full reproducibility:

```rust
// Access manifest from WorkingSet
let manifest = working_set.manifest.unwrap();
println!("Avocado version: {}", manifest.avocado_version);
println!("Embedding model: {}", manifest.embedding_model);
println!("Context hash: {}", manifest.context_hash);
```

The manifest includes: avocado version, tokenizer, embedding model, embedding dimensions, chunking params, index params, and a SHA256 context hash.

### Explain Plan

Understand exactly how context was selected with explain mode:

```bash
# CLI with explain
avocado compile "authentication" --explain

# Shows candidates at each pipeline stage:
# - Semantic search (top 50 from HNSW)
# - Lexical search (keyword matches)
# - Hybrid fusion (RRF combination)
# - MMR diversification
# - Token packing
# - Final deterministic order
```

```python
# Python SDK
result = db.compile("auth", budget=8000, explain=True)
if result.explain:
    print(f"Semantic candidates: {len(result.explain.semantic_candidates)}")
    print(f"Final spans: {len(result.explain.final_order)}")
```

### Working Set Diff

Compare retrieval results across corpus versions for auditing:

```rust
use avocado_core::{diff_working_sets, summarize_diff};

let diff = diff_working_sets(&before, &after);
println!("{}", summarize_diff(&diff));
// Output: "3 added, 1 removed, 2 reranked"
```

### Smart Incremental Rebuild

Only re-embed changed files - unchanged content is automatically skipped:

```bash
# First ingest
avocado ingest ./docs --recursive
# Ingested 42 files → 387 spans

# Re-ingest after editing 3 files
avocado ingest ./docs --recursive
# Skipped 39 unchanged, Updated 3 files → 28 spans
```

Content-hash comparison ensures minimal re-embedding while keeping the index fresh.

### Evaluation Metrics

Built-in support for golden set testing and quality metrics:

```rust
use avocado_core::{GoldenQuery, evaluate};

let queries = vec![
    GoldenQuery {
        query: "authentication".to_string(),
        expected_paths: vec!["docs/auth.md".to_string()],
        k: 10,
    },
];

let summary = evaluate(&queries, &db, &index, &config).await?;
println!("Recall@10: {:.2}%", summary.mean_recall * 100.0);
println!("MRR: {:.3}", summary.mean_mrr);
```

## Session Management

**NEW in v2.0**: Multi-turn conversation tracking with context compilation

AvocadoDB now supports session management, enabling AI agents to maintain conversation history and context across multiple interactions.

### Quick Example

```python
from avocado import AvocadoDB

db = AvocadoDB(mode="http")

# Create a session
session = db.create_session(user_id="alice", title="Project Q&A")

# Multi-turn conversation
result = session.compile("What is AvocadoDB?", budget=8000)
session.add_message("assistant", "AvocadoDB is a deterministic context database...")

result2 = session.compile("How does the compiler work?")
session.add_message("assistant", "The compiler uses hybrid search...")

# Get conversation history
history = session.get_history()

# Replay for debugging
replay = session.replay()
```

### Features

- **Multi-turn conversations**: Track user queries and agent responses
- **Context compilation**: Automatically compile context for each query
- **Conversation history**: Retrieve formatted history with token limiting
- **Session replay**: Debug agent behavior by replaying entire sessions
- **Persistence**: Sessions stored in SQLite with full ACID guarantees

### Available in

- ✅ **Python SDK**: Full session support with `Session` class
- ✅ **TypeScript SDK**: Complete session management API
- ✅ **CLI**: Session commands for interactive use
- ✅ **HTTP API**: RESTful endpoints for all session operations

See [SESSION_MANAGEMENT.md](docs/SESSION_MANAGEMENT.md) for complete documentation.

## Why Determinism Matters

When RAG systems return different context for the same query:
- LLMs produce inconsistent answers
- Users can't verify results
- Debugging is impossible
- Trust is broken

AvocadoDB fixes this with deterministic compilation - same query, same context, every time.

### Verify Determinism Yourself

```bash
# Run the same query multiple times
avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

avocado compile "authentication" --budget 8000 | head -100 | sha256sum
# e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

# Same hash every single time! ✅
```

## Performance

Phase 1 achieves production-ready performance:

| Metric | Target | Actual | Status |
|--------|--------|--------|--------|
| Compilation time (8K tokens) | < 500ms | ~50ms avg | ✅ 10x faster |
| Token budget utilization | > 95% | 90-95% | ✅ Excellent |
| Determinism | 100% | 100% | ✅ Perfect |
| Duplicate spans | 0 | 0 | ✅ Perfect |

**Breakdown** for 8K token budget compilation (with Pure Rust embeddings):

```
Embed query:          1-5ms      (2-5% of total) - Pure Rust (fastembed), local
Semantic search:      <1ms       (Vector similarity, HNSW)
Lexical search:       <1ms       (SQL LIKE query)
Hybrid fusion:        <1ms       (RRF score combination)
MMR diversification:  5-10ms     (Diversity selection)
Token packing:        <1ms       (Greedy budget allocation)
Deterministic sort:   <1ms       (Stable sort)
Build context:        <1ms       (Text concatenation)
Count tokens:         30-40ms    (tiktoken encoding)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
TOTAL:                40-60ms    (6x faster than OpenAI!)
```

**Performance Comparison:**

| Metric | Pure Rust (fastembed) | OpenAI API |
|--------|----------------------|------------|
| **Query Embedding** | 1-5ms | 200-300ms |
| **Total Compilation** | 40-60ms | 240-360ms |
| **Throughput** | 200-1000 texts/sec | 3-5 batches/sec |
| **Cost** | Free | ~$0.0001/1K tokens |
| **Rate Limits** | None | Varies by tier |
| **Offline** | ✅ Yes | ❌ No |
| **Quality** | Good (384 dims) | Excellent (1536 dims) |

**Pure Rust embeddings are 6x faster and completely free!**
**Optimization**: All algorithms run in <15ms total (highly optimized)

See [docs/performance.md](docs/performance.md) for detailed analysis and scaling characteristics.

## CLI Reference

### `avocado init`

Initialize a new AvocadoDB database:

```bash
avocado init [--path <db-path>]
```

Creates `.avocado/` directory with SQLite database and vector index.

### `avocado ingest`

Ingest documents into the database:

```bash
avocado ingest <path> [--recursive]
```

**Examples:**
```bash
# Ingest single file
avocado ingest README.md

# Ingest entire directory recursively
avocado ingest docs/ --recursive

# Ingest specific file types
avocado ingest src/ --recursive --include "*.rs,*.md,*.toml"
```

The ingestion process:
1. Reads document content
2. Extracts spans (20-50 lines with smart boundaries)
3. Generates embeddings for each span (local fastembed by default)
4. Stores in SQLite database

### `avocado compile`

Compile a deterministic context for a query:

```bash
avocado compile <query> [OPTIONS]
```

**Options:**
- `--budget <tokens>`: Token budget (default: 8000)
- `--json`: Output as JSON instead of human-readable format
- `--explain`: Show explain plan with candidates at each pipeline stage
- `--mmr-lambda <0.0-1.0>`: MMR diversity parameter (default: 0.5)
  - Higher values (0.7-1.0) = more relevant but potentially redundant
  - Lower values (0.0-0.3) = more diverse but potentially less relevant
- `--semantic-weight <float>`: Semantic search weight (default: 0.7)
- `--lexical-weight <float>`: Lexical search weight (default: 0.3)

**Examples:**
```bash
# Basic compilation
avocado compile "How does authentication work?"

# Large context window
avocado compile "error handling patterns" --budget 16000

# Prioritize diversity over relevance
avocado compile "testing strategies" --mmr-lambda 0.3

# Tune search weights (more keyword matching)
avocado compile "API endpoints" --semantic-weight 0.5 --lexical-weight 0.5

# JSON output for programmatic use
avocado compile "authentication" --budget 8000 --json
```

**JSON Output Format:**
```json
{
  "text": "[1] docs/auth.md\nLines 1-23\n\n# Authentication...",
  "spans": [
    {
      "id": "uuid",
      "artifact_id": "uuid",
      "start_line": 1,
      "end_line": 23,
      "text": "# Authentication...",
      "embedding": [0.002, 0.013, ...],
      "embedding_model": "text-embedding-ada-002",
      "token_count": 127,
      "metadata": null
    }
  ],
  "citations": [
    {
      "span_id": "uuid",
      "artifact_id": "uuid",
      "artifact_path": "docs/auth.md",
      "start_line": 1,
      "end_line": 23,
      "score": 0.0
    }
  ],
  "tokens_used": 2232,
  "query": "authentication",
  "compilation_time_ms": 243
}
```

### `avocado stats`

Show database statistics:

```bash
avocado stats
```

**Example output:**
```
Database Statistics:
  Artifacts: 42
  Spans: 387
  Total Tokens: 125,431
  Average Tokens/Span: 324
```

### `avocado clear`

Clear all data from the database:

```bash
avocado clear
```

**Warning**: This permanently deletes all ingested documents and embeddings!

## Library Usage (Rust)

Use AvocadoDB as a library in your Rust projects:

```toml
[dependencies]
avocado-core = "2.1"
tokio = { version = "1.35", features = ["full"] }
```

```rust
use avocado_core::{Database, VectorIndex, compiler, types::CompilerConfig};

#[tokio::main]
async fn main() -> avocado_core::types::Result<()> {
    // Open database
    let db = Database::new(".avocado/db.sqlite")?;

    // Load vector index from database
    let index = VectorIndex::from_database(&db)?;

    // Configure compilation
    let config = CompilerConfig {
        token_budget: 8000,
        semantic_weight: 0.7,
        lexical_weight: 0.3,
        mmr_lambda: 0.5,
        enable_mmr: true,
    };

    // Compile context
    let working_set = compiler::compile(
        "How does authentication work?",
        config,
        &db,
        &index,
        Some("your-openai-api-key")
    ).await?;

    println!("Compiled {} spans using {} tokens",
        working_set.spans.len(),
        working_set.tokens_used
    );

    println!("Deterministic hash: {}", working_set.deterministic_hash());

    // Use working_set.text in your LLM prompt
    println!("Context:\n{}", working_set.text);

    Ok(())
}
```

## Development

### Project Structure

```
avocadodb/
├── avocado-core/      # Core engine (Rust)
├── avocado-cli/       # Command-line tool
├── avocado-server/    # HTTP server
├── python/            # Python SDK
├── migrations/        # Database schema
├── tests/             # Integration tests
└── docs/              # Documentation
```

### Running Tests

```bash
# Unit tests
cargo test

# Integration tests (requires OPENAI_API_KEY)
cargo test --test determinism -- --ignored
cargo test --test performance -- --ignored
cargo test --test correctness -- --ignored
```

### Building

```bash
# Development build
cargo build

# Release build
cargo build --release

# Run CLI
cargo run --bin avocado -- --help

# Run server
cargo run --bin avocado-server
```

## Roadmap

### Phase 1 ✅ (Complete)
- [x] Core span extraction with smart boundaries
- [x] OpenAI embeddings integration
- [x] Hybrid search (semantic + lexical)
- [x] MMR diversification algorithm
- [x] Deterministic compilation (100% verified)
- [x] CLI tool with full features
- [x] HTTP server
- [x] Performance optimization (240ms avg)
- [x] Comprehensive documentation

### Phase 2 - Advanced Features
- [x] Version manifest for full reproducibility
- [x] Explain plan for retrieval debugging
- [x] Working set diff for corpus auditing
- [x] Smart incremental rebuild (content-hash based)
- [x] Evaluation metrics (recall@k, MRR)
- [ ] Multi-modal support (images, code)
- [ ] Advanced retrieval (BM25, learned rankers)
- [ ] PostgreSQL support
- [ ] Framework integrations (LangChain, LlamaIndex)

### Phase 3 - Agent Memory
- [x] Session management
- [ ] Working set versioning
- [ ] Collaborative features
- [ ] Memory systems

## Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## License

MIT License - see [LICENSE](LICENSE) for details.

## Testing

AvocadoDB includes comprehensive test suites to validate determinism and performance:

```bash
# Run all tests and generate report
./scripts/run-tests.sh

# Run determinism validation only (100 iterations)
./scripts/test-determinism.sh

# Run performance benchmarks
./scripts/benchmark.sh
```

See [docs/testing.md](docs/testing.md) for complete testing documentation.

## Learn More

- [Quick Start Guide](QUICKSTART.md) - Get running in 5 minutes
- [Examples](docs/examples.md) - Real-world usage patterns
- [Testing Guide](docs/testing.md) - Validation and benchmarking
- [Performance Analysis](docs/performance.md)
- [UI Improvements](docs/UI-IMPROVEMENTS.md)

---

**Built by the AvocadoDB Team** | Making retrieval deterministic, one context at a time.