# VectorLite
[](https://crates.io/crates/vectorlite)
[](https://docs.rs/vectorlite)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.rust-lang.org)
[](https://github.com/mmailhos/vectorlite/actions)
**A tiny, in-process Rust vector store with built-in embeddings for sub-millisecond semantic search.**
VectorLite is a high-performance, **in-memory vector database** optimized for **AI agent** and **edge** workloads.
It co-locates model inference (via [Candle](https://github.com/huggingface/candle)) with a low-latency vector index, making it ideal for **session-scoped**, **single-instance**, or **privacy-sensitive** environments.
## Why VectorLite?
| **Sub-millisecond search** | In-memory HNSW or flat search tuned for real-time agent loops. |
| **Built-in embeddings** | Runs [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) locally using Candle, or any other model of your choice. No external API calls. |
| **Single-binary simplicity** | No dependencies, no servers to orchestrate. Start instantly via CLI or Docker. |
| **Session-scoped collections** | Perfect for ephemeral agent sessions or sidecars |
| **Thread-safe concurrency** | RwLock-based access and atomic ID generation for multi-threaded workloads. |
| **Instant persistence** | Save or restore collections snapshots in one call. |
VectorLite trades distributed scalability for deterministic performance, perfect for use cases where latency mattters more than millions of vectors.
## When to Use It
| **AI agent sessions** | Keep short-lived embeddings per conversation. No network latency. |
| **Edge or embedded AI** | Run fully offline with model + index in one binary. |
| **Realtime search / personalization** | Sub-ms search for pre-computed embeddings. |
| **Local prototyping & CI** | Rust-native, no external services. |
| **Single-tenant microservices** | Lightweight sidecar for semantic capabilities. |
## Quick Start
### Run from Source
```bash
cargo run --bin vectorlite -- --port 3001
# Start with preloaded collection
cargo run --bin vectorlite -- --filepath ./my_collection.vlc --port 3001
```
### Run with Docker
With default settings:
```bash
docker build -t vectorlite .
docker run -p 3001:3001 vectorlite
```
With a different embeddings model and memory-optimized HNSW:
```bash
docker build \
--build-arg MODEL_NAME="sentence-transformers/paraphrase-MiniLM-L3-v2" \
--build-arg FEATURES="memory-optimized" \
-t vectorlite-small .
```
## HTTP API Overview
| **Health** | `GET /health` | – |
| **List collections** | `GET /collections` | – |
| **Create collection** | `POST /collections` | `{"name": "docs", "index_type": "hnsw"}` |
| **Delete collection** | `DELETE /collections/{name}` | – |
| **Add text** | `POST /collections/{name}/text` | `{"text": "Hello world", "metadata": {...}}`|
| **Search (text)** | `POST /collections/{name}/search/text` | `{"query": "hello", "k": 5}` |
| **Get vector** | `GET /collections/{name}/vectors/{id}` | – |
| **Delete vector** | `DELETE /collections/{name}/vectors/{id}` | – |
| **Save collection** | `POST /collections/{name}/save` | `{"file_path": "./collection.vlc"}` |
| **Load collection** | `POST /collections/load` | `{"file_path": "./collection.vlc", "collection_name": "restored"}` |
## Index Types
| **Flat** | O(n) | O(1) | Small datasets (<10K) or exact search |
| **HNSW** | O(log n) | O(log n) | Larger datasets or approximate search |
See [Hierarchical Navigable Small World](https://arxiv.org/abs/1603.09320).
### Configuration profiles for HNSW
| **default** | balanced | general workloads |
| **memory-optimized** | reduced precision, smaller graph | constrained devices |
| **high-accuracy** | higher recall, more memory | offline re-ranking or research |
```bash
cargo build --features memory-optimized
```
### Similarity Metrics
- **Cosine**: Default for normalized embeddings, scale-invariant
- **Euclidean**: Geometric distance, sensitive to vector magnitude
- **Manhattan**: L1 norm, robust to outliers
- **Dot Product**: Raw similarity, requires consistent vector scaling
## Rust SDK Example
```rust,no_run
use vectorlite::{VectorLiteClient, EmbeddingGenerator, IndexType, SimilarityMetric};
use serde_json::json;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = VectorLiteClient::new(Box::new(EmbeddingGenerator::new()?));
client.create_collection("quotes", IndexType::HNSW)?;
let id = client.add_text_to_collection(
"quotes",
"I just want to lie on the beach and eat hot dogs",
Some(json!({
"author": "Kevin Malone",
"tags": ["the-office", "s3:e23"],
"year": 2005,
}))
)?;
let results = client.search_text_in_collection(
"quotes",
"beach games",
3,
SimilarityMetric::Cosine,
)?;
for result in &results {
println!("ID: {}, Score: {:.4}", result.id, result.score);
}
Ok(())
}
```
## Testing
Run tests with mock embeddings (CI-friendly, no model files required):
```bash
cargo test --features mock-embeddings
```
Run tests with local models:
```bash
cargo test
```
### Download ML Model
This downloads the BERT-based embedding model files needed for real embedding generation:
```bash
huggingface-cli download sentence-transformers/all-MiniLM-L6-v2 --local-dir models/all-MiniLM-L6-v2
```
The model files must be present in the `./models/{model-name}/` directory with the required files:
- `config.json`
- `pytorch_model.bin`
- `tokenizer.json`
### Using a different model
You can override the default embedding model at compile time using the `custom-model` feature:
```bash
DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo build --features custom-model
DEFAULT_EMBEDDING_MODEL="sentence-transformers/paraphrase-MiniLM-L3-v2" cargo run --features custom-model
```
## License
Apache 2.0 License - see [LICENSE](LICENSE) for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.