Feather DB
SQLite for Vectors - A fast, lightweight vector database built with C++ and HNSW (Hierarchical Navigable Small World) algorithm for approximate nearest neighbor search.
Features
- 🚀 High Performance: Built with C++ and optimized HNSW algorithm
- 🐍 Python Integration: Native Python bindings with NumPy support
- 🦀 Rust CLI: Command-line interface for easy database operations
- 💾 Persistent Storage: Custom binary format with automatic save/load
- 🔍 Fast Search: Approximate nearest neighbor search with configurable parameters
- 📦 Multi-Language: C++, Python, and Rust APIs
Quick Start
Python Usage
# Open or create a database
=
# Add vectors
=
# Search for similar vectors
=
, =
# Save the database
C++ Usage
int
CLI Usage
# Create a new database
# Add vectors from NumPy files
# Search for similar vectors
Installation
Prerequisites
- C++17 compatible compiler
- Python 3.8+ (for Python bindings)
- Rust 1.70+ (for CLI tool)
- pybind11 (for Python bindings)
Build from Source
-
Clone the repository
-
Build C++ Core
-
Build Python Bindings
-
Build Rust CLI
Architecture
Core Components
feather::DB: Main C++ class providing vector database functionality- HNSW Index: Hierarchical Navigable Small World algorithm for fast ANN search
- Binary Format: Custom storage format with magic number validation
- Multi-language Bindings: Python (pybind11) and Rust (FFI) interfaces
File Format
Feather uses a custom binary format:
[4 bytes] Magic number: 0x46454154 ("FEAT")
[4 bytes] Version: 1
[4 bytes] Dimension
[Records] ID (8 bytes) + Vector data (dim * 4 bytes)
Performance Characteristics
- Index Type: HNSW with L2 distance
- Max Elements: 1,000,000 (configurable)
- Construction Parameters: M=16, ef_construction=200
- Memory Usage: ~4 bytes per dimension per vector + index overhead
API Reference
Python API
feather_py.DB
DB.open(path: str, dim: int = 768): Open or create databaseadd(id: int, vec: np.ndarray): Add vector with IDsearch(query: np.ndarray, k: int = 5): Search k nearest neighborssave(): Persist database to diskdim(): Get vector dimension
C++ API
feather::DB
static std::unique_ptr<DB> open(path, dim): Factory methodvoid add(uint64_t id, const std::vector<float>& vec): Add vectorauto search(const std::vector<float>& query, size_t k): Search vectorsvoid save(): Save to disksize_t dim() const: Get dimension
CLI Commands
feather new <path> --dim <dimension>: Create new databasefeather add <db> <id> --npy <file>: Add vector from .npy filefeather search <db> --npy <query> --k <count>: Search similar vectors
Examples
Semantic Search with Embeddings
# Create database for sentence embeddings
=
# Add document embeddings
=
# Assume get_embedding() returns a 384-dim vector
=
# Search for similar documents
=
, =
Batch Processing
=
# Batch add vectors
= 1000
= +
=
# Periodic save
Performance Tips
- Batch Operations: Add vectors in batches and save periodically
- Memory Management: Consider vector dimension vs. memory usage trade-offs
- Search Parameters: Adjust
kparameter based on your precision/recall needs - File I/O: Use SSD storage for better performance with large databases
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
[Add your license information here]