polarisdb 0.1.2

PolarisDB - an embedded vector database for local AI and RAG workloads
Documentation

Why PolarisDB?

PolarisDB is built for developers who need fast, local vector search without the complexity of external services.

PolarisDB

Runs 100% locally yes
Data never leaves your machine yes
Zero network latency yes
Free forever — no usage fees yes
Pure Rust — no C++ deps yes

Cloud Vector DBs

Requires internet connection no
Data on third-party servers no
Network round-trip latency no
Pay-per-query pricing no
Often wrapped C++ libraries no

Built for: RAG applications · Semantic search · Recommendations · Edge AI · Game AI

Features

High-Performance Indexing

Index Type Use Case Complexity
BruteForce Small datasets (<10K vectors) O(n) exact
HNSW Large datasets (millions) O(log n) approximate

Distance Metrics

DistanceMetric::Euclidean   // L2 distance
DistanceMetric::Cosine      // Angular similarity (text embeddings)
DistanceMetric::DotProduct  // Maximum inner product
DistanceMetric::Hamming     // Binary vectors

Metadata Filtering

Combine vector similarity with metadata conditions:

// Find similar documents from 2024 in the "AI" category
let filter = Filter::field("category").eq("AI")
    .and(Filter::field("year").gte(2024));

let results = index.search(&query_embedding, 10, Some(filter));

Durable Persistence

  • Write-Ahead Log (WAL) for crash safety
  • Automatic recovery on restart
  • Memory-mapped files for efficient disk access

Async Support

// Enable with: polarisdb = { version = "0.1", features = ["async"] }
let collection = AsyncCollection::open_or_create("./data", config).await?;
collection.insert(id, embedding, payload).await?;
let results = collection.search(&query, 10, None).await;

Python Bindings

import polarisdb

# Persistent Collection
col = polarisdb.Collection.open_or_create("./data/my_col", 384, "cosine")
col.insert(1, [0.1, 0.2, ...])
results = col.search([0.1, 0.2, ...], 5)

LangChain Integration

Use PolarisDB as a vector store in your RAG pipelines:

from polarisdb.langchain import PolarisDBVectorStore
from langchain_openai import OpenAIEmbeddings

# Create vector store from documents
vectorstore = PolarisDBVectorStore.from_texts(
    texts=["Document 1", "Document 2", "Document 3"],
    embedding=OpenAIEmbeddings(),
    collection_path="./my_vectors",
)

# Similarity search
docs = vectorstore.similarity_search("query", k=3)

# Use as retriever in RAG chain
retriever = vectorstore.as_retriever()

See examples/langchain_rag.py for a complete RAG example.

HTTP Server

Run the standalone server:

cargo run -p polarisdb-server

Integrate via REST API:

curl -X POST http://localhost:8080/collections/my_col/search \
  -d '{"vector": [0.1, ...], "k": 5}'

Quick Start

Add PolarisDB to your Cargo.toml:

[dependencies]
polarisdb = "0.1"

Basic Usage

use polarisdb::prelude::*;

fn main() -> Result<()> {
    // Create a collection for 384-dimensional embeddings
    let config = CollectionConfig::new(384, DistanceMetric::Cosine);
    let collection = Collection::open_or_create("./my_vectors", config)?;

    // Insert vectors with metadata
    let embedding = get_embedding("Introduction to Rust"); // Your embedding function
    let payload = Payload::new()
        .with_field("title", "Introduction to Rust")
        .with_field("category", "programming")
        .with_field("year", 2024);
    
    collection.insert(1, embedding, payload)?;

    // Search for similar vectors
    let query = get_embedding("Rust programming tutorial");
    let results = collection.search(&query, 5, None);

    for result in results {
        if let Some(payload) = &result.payload {
            println!(
                "Found: {} (distance: {:.4})",
                payload.get_str("title").unwrap_or("Unknown"),
                result.distance
            );
        }
    }

    collection.flush()?; // Ensure durability
    Ok(())
}

High-Performance HNSW Index

For millions of vectors, use the HNSW index:

let config = HnswConfig {
    m: 16,              // Connections per node
    m_max0: 32,         // Connections at layer 0
    ef_construction: 100, // Build-time beam width
    ef_search: 50,      // Search-time beam width
};

let mut index = HnswIndex::new(DistanceMetric::Cosine, 384, config);

// Insert vectors
for (id, embedding, metadata) in documents {
    index.insert(id, embedding, metadata)?;
}

// Search with ~9x speedup over brute-force
let results = index.search(&query, 10, None, None);

Pre-Filtered Search with Bitmap Index

For highly selective filters:

// Build a bitmap index alongside your vector index
let mut bitmap = BitmapIndex::new();
let mut hnsw = HnswIndex::new(DistanceMetric::Cosine, 384, config);

for (id, embedding, payload) in documents {
    hnsw.insert(id, embedding.clone(), payload.clone())?;
    bitmap.insert(id, &payload);
}

// Query with bitmap pre-filtering
let filter = Filter::field("category").eq("AI");
let valid_ids = bitmap.query(&filter);
let results = hnsw.search_with_bitmap(&query, 10, None, &valid_ids);

Examples

Run the included examples:

# HNSW performance benchmark (9x speedup demo)
cargo run --release --example hnsw_demo

# Async concurrent insertions
cargo run --release --example async_demo --features async

# Pre-filtering benchmark
cargo run --release --example prefilter_demo

# Ollama RAG integration (requires Ollama running)
cargo run --release --example ollama_rag

Performance

Benchmarked on M1 MacBook Pro with 128-dimensional vectors (Cosine distance):

Operation Vectors Time Throughput
Brute Force Search 1,000 325 µs 3.1M elem/s
Brute Force Search 10,000 5.5 ms 1.8M elem/s
Brute Force Search 50,000 34 ms 1.5M elem/s

Distance Calculations (SIMD-optimized)

Dimension Dot Product Throughput
128 81 ns 1.6 Gelem/s
384 155 ns 2.5 Gelem/s
768 154 ns 5.0 Gelem/s
1536 304 ns 5.1 Gelem/s

Scaling Projections

Vectors HNSW Search Time Memory
10K ~500 µs 12 MB
100K ~600 µs 120 MB
1M ~800 µs 1.2 GB

HNSW search time scales logarithmically. Brute force scales linearly.

Documentation

Architecture

polarisdb/
├── polarisdb-core/          # Core library (distance, indexing, storage)
│   ├── index/               # BruteForce, HNSW implementations
│   ├── storage/             # WAL, persistence layer
│   └── filter/              # Bitmap filtering
│
├── polarisdb/               # Main crate (convenient re-exports)
├── polarisdb-server/        # HTTP API server (axum)
└── py/                      # Python bindings (pyo3 + maturin)

Roadmap

  • v0.1 — Core functionality, brute-force search, filtering
  • v0.2 — WAL persistence, crash recovery
  • v0.3 — HNSW approximate nearest neighbor
  • v0.4 — Bitmap pre-filtering, async API, SIMD acceleration
  • v0.5 — Python bindings, PyPI release
  • v0.6 — LangChain integration, multi-vector queries
  • v1.0 — Stable API, product quantization, hybrid search

Comparison

Feature PolarisDB LanceDB Chroma Qdrant
Language Rust Rust/Python Python Rust
Embedded yes yes Partial no
Python Bindings yes PyPI yes yes Native yes
HNSW yes yes yes yes
Persistence yes WAL yes Lance yes SQLite yes Raft
Filtering yes Bitmap yes yes yes
Async yes yes no yes
SIMD yes yes no yes

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Clone and build
git clone https://github.com/hugoev/polarisdb.git
cd polarisdb
cargo build

# Run tests
cargo test --workspace --all-features

# Run clippy
cargo clippy -- -D warnings

License

Licensed under either of:

at your option.

Acknowledgments

  • HNSW Paper — Hierarchical Navigable Small World graphs
  • Roaring Bitmaps — Compressed bitmap data structure
  • The Rust community