Skip to main content

Crate polarisdb

Crate polarisdb 

Source
Expand description

§PolarisDB

A pure-Rust embedded vector database for local AI and RAG workloads.

PolarisDB provides fast, in-process vector similarity search optimized for:

  • RAG applications — Semantic retrieval for LLM context
  • Semantic search — Find similar documents, images, or audio
  • Recommendations — Content-based filtering with embeddings
  • Edge AI — Local inference without cloud dependencies

§Features

FeatureDescription
Dual Index TypesBruteForce (exact) and HNSW (approximate)
Distance MetricsEuclidean, Cosine, DotProduct, Hamming
Filtered SearchCombine similarity with metadata conditions
PersistenceWAL-based crash-safe durability
Async APITokio-compatible async operations (feature flag)

§Quick Start

§In-Memory Index

For quick prototyping or small datasets:

use polarisdb::prelude::*;

// Create a 384-dimensional index (common embedding size)
let mut index = BruteForceIndex::new(DistanceMetric::Cosine, 384);

// Insert vectors with metadata
let embedding = vec![0.1; 384];
let payload = Payload::new()
    .with_field("title", "Introduction to Rust")
    .with_field("category", "programming");
index.insert(1, embedding, payload).unwrap();

// Search for similar vectors
let query = vec![0.1; 384];
let results = index.search(&query, 10, None);

assert!(!results.is_empty());

§Persistent Collection

For production use with crash-safe durability:

use polarisdb::prelude::*;

fn main() -> Result<()> {
    // Open or create a persistent collection
    let config = CollectionConfig::new(384, DistanceMetric::Cosine);
    let collection = Collection::open_or_create("./my_vectors", config)?;

    // Insert with automatic ID generation
    let id = collection.insert_auto(
        vec![0.1; 384],
        Payload::new().with_field("doc", "example"),
    )?;

    // Search with filters
    let filter = Filter::field("doc").eq("example");
    let results = collection.search(&[0.1; 384], 10, Some(filter));

    // Flush to ensure durability
    collection.flush()?;
    Ok(())
}

§HNSW for Large Datasets

For millions of vectors with fast approximate search:

use polarisdb::prelude::*;

let config = HnswConfig {
    m: 16,              // Connections per node (higher = better recall)
    m_max0: 32,         // Layer 0 connections
    ef_construction: 100, // Build-time beam width
    ef_search: 50,      // Search-time beam width
};

let mut index = HnswIndex::new(DistanceMetric::Cosine, 128, config);

// Insert 1000 vectors
for i in 0..1000 {
    let v: Vec<f32> = (0..128).map(|j| ((i * 128 + j) as f32).sin()).collect();
    index.insert(i, v, Payload::new()).unwrap();
}

// Search is ~10x faster than brute-force
let query: Vec<f32> = (0..128).map(|j| (j as f32).cos()).collect();
let results = index.search(&query, 10, None, None);

Combine vector similarity with metadata conditions:

use polarisdb::prelude::*;

let mut index = BruteForceIndex::new(DistanceMetric::Cosine, 3);

// Insert vectors with categories
index.insert(1, vec![1.0, 0.0, 0.0], Payload::new().with_field("category", "A")).unwrap();
index.insert(2, vec![0.9, 0.1, 0.0], Payload::new().with_field("category", "B")).unwrap();
index.insert(3, vec![0.8, 0.2, 0.0], Payload::new().with_field("category", "A")).unwrap();

// Search only category A
let filter = Filter::field("category").eq("A");
let results = index.search(&[1.0, 0.0, 0.0], 10, Some(filter));

assert_eq!(results.len(), 2); // Only category A results

§Crate Features

FeatureDescription
asyncEnables AsyncCollection for tokio compatibility

Enable features in Cargo.toml:

[dependencies]
polarisdb = { version = "0.1", features = ["async"] }

§Architecture

PolarisDB is organized into two crates:

  • polarisdb-core — Core library with no async runtime dependency
  • polarisdb — Main crate that re-exports everything

§Core Components

  • BruteForceIndex — Exact nearest neighbor search (O(n))
  • HnswIndex — Approximate nearest neighbor search (O(log n))
  • Collection — Persistent storage with WAL durability
  • Filter — Metadata filter expressions
  • BitmapIndex — Roaring bitmap for fast pre-filtering
  • Payload — JSON-like metadata for vectors

§Performance

Benchmarked on 10,000 128-dimensional vectors:

IndexSearch TimeRecall
BruteForce2.8 ms100%
HNSW300 µs99%+

§Error Handling

All fallible operations return Result<T>, which uses the Error enum for error types.

§Thread Safety

Modules§

collection
Persistent vector collection with crash-safe storage.
distance
Distance metrics for vector similarity computations.
error
Error types for PolarisDB operations.
filter
Filter expressions for metadata-based vector filtering.
index
Index implementations for vector storage and retrieval.
payload
Payload storage for vector metadata.
prelude
Re-export commonly used types for convenience.
storage
Storage module for persistent vector storage.
vector
Vector types and operations for PolarisDB.

Structs§

AsyncCollection
Async wrapper for Collection.
BitmapIndex
A bitmap-based index for fast payload filtering.
BruteForceIndex
Brute-force index that computes exact distances to all vectors.
Collection
A persistent vector collection.
CollectionConfig
Configuration for a collection.
Distance
A computed distance value with its metric type.
Filter
A filter expression that can be evaluated against a payload.
HnswConfig
Configuration for HNSW index.
HnswIndex
HNSW index for approximate nearest neighbor search.
Payload
Metadata payload attached to a vector.
SearchResult
A single search result with ID, distance, and optional payload.
Vector
A dense vector of floating-point values.

Enums§

DistanceMetric
Supported distance metrics for vector similarity.
Error
Errors that can occur during PolarisDB operations.
FilterCondition
The actual filter condition variants.

Type Aliases§

Result
Result type alias using PolarisDB’s Error type.
VectorId
Unique identifier for a vector in the index.