Feather DB ðŠķ
Fast, lightweight context-aware vector database
Part of Hawky.ai - AI Native Digital Marketing OS
A fast, lightweight vector database built with C++ and HNSW (Hierarchical Navigable Small World) algorithm for approximate nearest neighbor search.
Features
- ð High Performance: Built with C++ and optimized HNSW algorithm
- ð§ Context Engine: Structured metadata storage (Facts, Preferences, Events, Conversations)
- âģ Temporal Retrieval: Time-weighted scoring with exponential decay
- ð Filtered Search: Domain-logic filtering (by type, source, tags) during HNSW search
- ð Python Integration: Native Python bindings with
FilterBuildersupport - ðĶ Rust CLI: Enhanced CLI for metadata and filtered operations
- ðū Persistent Storage: Version 2 binary format with automatic metadata persistence
Quick Start
Python Usage
# Open or create a database
=
# Add vectors
=
# Search for similar vectors
=
, =
# Save the database
### Context Engine (Phase 2)
```
# Add with metadata
=
=
=
= 0.9
# Search with filters and temporal decay
=
=
=
### C++ Usage
```cpp
#include "include/feather.h"
#include <vector>
int main() {
// Open database
auto db = feather::DB::open("my_vectors.feather", 768);
// Add a vector
std::vector<float> vec(768, 0.1f);
db->add(1, vec);
// Search
std::vector<float> query(768, 0.1f);
auto results = db->search(query, 5);
for (auto [id, distance] : results) {
std::cout << "ID: " << id << ", Distance: " << distance << std::endl;
}
return 0;
}
CLI Usage
# Create a new database
# Add vectors from NumPy files
# Search for similar vectors
Rust CLI
The CLI is available as a native binary for fast database management.
# Add with metadata
# Search with filters
Installation
Python Package (Recommended)
Build from Source
Prerequisites
- C++17 compatible compiler
- Python 3.8+ (for Python bindings)
- Rust 1.70+ (for CLI tool)
- pybind11 (for Python bindings)
Steps
-
Clone the repository
-
Install Python Package
-
Build Rust CLI (Optional)
Architecture
Core Components
feather::DB: Main C++ class providing vector database functionality- HNSW Index: Hierarchical Navigable Small World algorithm for fast ANN search
- Binary Format: Custom storage format with magic number validation
- Multi-language Bindings: Python (pybind11) and Rust (FFI) interfaces
File Format
Feather uses a custom binary format:
[4 bytes] Magic number: 0x46454154 ("FEAT")
[4 bytes] Version: 1
[4 bytes] Dimension
[Records] ID (8 bytes) + Vector data (dim * 4 bytes)
Performance Characteristics
- Index Type: HNSW with L2 distance
- Max Elements: 1,000,000 (configurable)
- Construction Parameters: M=16, ef_construction=200
- Memory Usage: ~4 bytes per dimension per vector + index overhead
API Reference
Python API
feather_db.DB
DB.open(path: str, dim: int = 768): Open or create databaseadd(id: int, vec: np.ndarray): Add vector with IDsearch(query: np.ndarray, k: int = 5): Search k nearest neighborssave(): Persist database to diskdim(): Get vector dimension
C++ API
feather::DB
static std::unique_ptr<DB> open(path, dim): Factory methodvoid add(uint64_t id, const std::vector<float>& vec): Add vectorauto search(const std::vector<float>& query, size_t k): Search vectorsvoid save(): Save to disksize_t dim() const: Get dimension
CLI Commands
feather new <path> --dim <dimension>: Create new databasefeather add <db> <id> --npy <file>: Add vector from .npy filefeather search <db> --npy <query> --k <count>: Search similar vectors
Examples
Semantic Search with Embeddings
# Create database for sentence embeddings
=
# Add document embeddings
=
# Assume get_embedding() returns a 384-dim vector
=
# Search for similar documents
=
, =
Batch Processing
=
# Batch add vectors
= 1000
= +
=
# Periodic save
Performance Tips
- Batch Operations: Add vectors in batches and save periodically
- Memory Management: Consider vector dimension vs. memory usage trade-offs
- Search Parameters: Adjust
kparameter based on your precision/recall needs - File I/O: Use SSD storage for better performance with large databases
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
[Add your license information here]