polarisdb 0.1.2

Why PolarisDB?

PolarisDB is built for developers who need fast, local vector search without the complexity of external services.

PolarisDB


Runs 100% locally
Data never leaves your machine
Zero network latency
Free forever — no usage fees
Pure Rust — no C++ deps

Cloud Vector DBs


Requires internet connection
Data on third-party servers
Network round-trip latency
Pay-per-query pricing
Often wrapped C++ libraries

Built for: RAG applications · Semantic search · Recommendations · Edge AI · Game AI

Features

High-Performance Indexing

Index Type	Use Case	Complexity
BruteForce	Small datasets (<10K vectors)	O(n) exact
HNSW	Large datasets (millions)	O(log n) approximate

Distance Metrics

DistanceMetric::Euclidean   // L2 distance
DistanceMetric::Cosine      // Angular similarity (text embeddings)
DistanceMetric::DotProduct  // Maximum inner product
DistanceMetric::Hamming     // Binary vectors

Metadata Filtering

Combine vector similarity with metadata conditions:

// Find similar documents from 2024 in the "AI" category
let filter = Filter::field("category").eq("AI")
    .and(Filter::field("year").gte(2024));

let results = index.search(&query_embedding, 10, Some(filter));

Durable Persistence

Write-Ahead Log (WAL) for crash safety
Automatic recovery on restart
Memory-mapped files for efficient disk access

Async Support

// Enable with: polarisdb = { version = "0.1", features = ["async"] }
let collection = AsyncCollection::open_or_create("./data", config).await?;
collection.insert(id, embedding, payload).await?;
let results = collection.search(&query, 10, None).await;

Python Bindings

import polarisdb

# Persistent Collection
col = polarisdb.Collection.open_or_create("./data/my_col", 384, "cosine")
col.insert(1, [0.1, 0.2, ...])
results = col.search([0.1, 0.2, ...], 5)

LangChain Integration

Use PolarisDB as a vector store in your RAG pipelines:

from polarisdb.langchain import PolarisDBVectorStore
from langchain_openai import OpenAIEmbeddings

# Create vector store from documents
vectorstore = PolarisDBVectorStore.from_texts(
    texts=["Document 1", "Document 2", "Document 3"],
    embedding=OpenAIEmbeddings(),
    collection_path="./my_vectors",
)

# Similarity search
docs = vectorstore.similarity_search("query", k=3)

# Use as retriever in RAG chain
retriever = vectorstore.as_retriever()

See examples/langchain_rag.py for a complete RAG example.

HTTP Server

Run the standalone server:

cargo run -p polarisdb-server

Integrate via REST API:

curl -X POST http://localhost:8080/collections/my_col/search \
  -d '{"vector": [0.1, ...], "k": 5}'

Quick Start

Add PolarisDB to your Cargo.toml:

[dependencies]
polarisdb = "0.1"

Basic Usage

use polarisdb::prelude::*;

fn main() -> Result<()> {
    // Create a collection for 384-dimensional embeddings
    let config = CollectionConfig::new(384, DistanceMetric::Cosine);
    let collection = Collection::open_or_create("./my_vectors", config)?;

    // Insert vectors with metadata
    let embedding = get_embedding("Introduction to Rust"); // Your embedding function
    let payload = Payload::new()
        .with_field("title", "Introduction to Rust")
        .with_field("category", "programming")
        .with_field("year", 2024);
    
    collection.insert(1, embedding, payload)?;

    // Search for similar vectors
    let query = get_embedding("Rust programming tutorial");
    let results = collection.search(&query, 5, None);

    for result in results {
        if let Some(payload) = &result.payload {
            println!(
                "Found: {} (distance: {:.4})",
                payload.get_str("title").unwrap_or("Unknown"),
                result.distance
            );
        }
    }

    collection.flush()?; // Ensure durability
    Ok(())
}

High-Performance HNSW Index

For millions of vectors, use the HNSW index:

let config = HnswConfig {
    m: 16,              // Connections per node
    m_max0: 32,         // Connections at layer 0
    ef_construction: 100, // Build-time beam width
    ef_search: 50,      // Search-time beam width
};

let mut index = HnswIndex::new(DistanceMetric::Cosine, 384, config);

// Insert vectors
for (id, embedding, metadata) in documents {
    index.insert(id, embedding, metadata)?;
}

// Search with ~9x speedup over brute-force
let results = index.search(&query, 10, None, None);

Pre-Filtered Search with Bitmap Index

For highly selective filters:

// Build a bitmap index alongside your vector index
let mut bitmap = BitmapIndex::new();
let mut hnsw = HnswIndex::new(DistanceMetric::Cosine, 384, config);

for (id, embedding, payload) in documents {
    hnsw.insert(id, embedding.clone(), payload.clone())?;
    bitmap.insert(id, &payload);
}

// Query with bitmap pre-filtering
let filter = Filter::field("category").eq("AI");
let valid_ids = bitmap.query(&filter);
let results = hnsw.search_with_bitmap(&query, 10, None, &valid_ids);

Examples

Run the included examples:

# HNSW performance benchmark (9x speedup demo)
cargo run --release --example hnsw_demo

# Async concurrent insertions
cargo run --release --example async_demo --features async

# Pre-filtering benchmark
cargo run --release --example prefilter_demo

# Ollama RAG integration (requires Ollama running)
cargo run --release --example ollama_rag

Performance

Benchmarked on M1 MacBook Pro with 128-dimensional vectors (Cosine distance):

Operation	Vectors	Time	Throughput
Brute Force Search	1,000	325 µs	3.1M elem/s
Brute Force Search	10,000	5.5 ms	1.8M elem/s
Brute Force Search	50,000	34 ms	1.5M elem/s

Distance Calculations (SIMD-optimized)

Dimension	Dot Product	Throughput
128	81 ns	1.6 Gelem/s
384	155 ns	2.5 Gelem/s
768	154 ns	5.0 Gelem/s
1536	304 ns	5.1 Gelem/s

Scaling Projections

Vectors	HNSW Search Time	Memory
10K	~500 µs	12 MB
100K	~600 µs	120 MB
1M	~800 µs	1.2 GB

HNSW search time scales logarithmically. Brute force scales linearly.

Documentation

📖 API Reference — Complete rustdoc documentation
📚 Examples — Working code examples
🔧 CONTRIBUTING.md — Development guide
📝 CHANGELOG.md — Version history

Architecture

polarisdb/
├── polarisdb-core/          # Core library (distance, indexing, storage)
│   ├── index/               # BruteForce, HNSW implementations
│   ├── storage/             # WAL, persistence layer
│   └── filter/              # Bitmap filtering
│
├── polarisdb/               # Main crate (convenient re-exports)
├── polarisdb-server/        # HTTP API server (axum)
└── py/                      # Python bindings (pyo3 + maturin)

Roadmap

v0.1 — Core functionality, brute-force search, filtering
v0.2 — WAL persistence, crash recovery
v0.3 — HNSW approximate nearest neighbor
v0.4 — Bitmap pre-filtering, async API, SIMD acceleration
v0.5 — Python bindings, PyPI release
v0.6 — LangChain integration, multi-vector queries
v1.0 — Stable API, product quantization, hybrid search

Comparison

Feature	PolarisDB	LanceDB	Chroma	Qdrant
Language	Rust	Rust/Python	Python	Rust
Embedded			Partial
Python Bindings	PyPI		Native
HNSW
Persistence	WAL	Lance	SQLite	Raft
Filtering	Bitmap
Async
SIMD

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Clone and build
git clone https://github.com/hugoev/polarisdb.git
cd polarisdb
cargo build

# Run tests
cargo test --workspace --all-features

# Run clippy
cargo clippy -- -D warnings

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE or http://opensource.org/licenses/MIT)

at your option.

Acknowledgments

HNSW Paper — Hierarchical Navigable Small World graphs
Roaring Bitmaps — Compressed bitmap data structure
The Rust community