# dRAGon
dRAGon is an embedded Vector database in Rust with helper functions for Retrieval-Augmented Generation (RAG). It provides a powerful and flexible solution for managing and querying vector embeddings, making it ideal for various natural language processing and machine learning applications.
## Features
- 🚀 Efficient vector storage and retrieval using [LanceDB](https://github.com/lancedb/lancedb)
- 📄 Support for multiple file formats (PDF, TXT, DOCX, CSV)
- 🔍 Similarity search functionality
- 📦 Batch operations for adding and updating vectors
- 🧠 Customizable embedding model and inference via [fastembed-rs](https://github.com/Anush008/fastembed-rs)
- 🌐 RESTful API for easy integration
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
- [API Endpoints](#api-endpoints)
- [Configuration](#configuration)
- [Development](#development)
- [Testing](#testing)
- [License](#license)
## Installation
### Prerequisites
- Rust
- `protoc` (Protocol Buffers Compiler)
### Installing `protoc`
#### MacOS
```bash
brew install protobuf
```
#### Linux
```bash
sudo apt-get install protobuf-compiler
```
#### Windows
```powershell
choco install protobuf
```
or
```powershell
scoop install protobuf
```
### Building from Source
1. Clone the repository:
```bash
git clone https://github.com/portalcorp/dRAGon.git
cd dRAGon
```
2. Build the project:
```bash
cargo build --release
```
3. Run the server:
```bash
cargo run
```
Certainly! I'll create a new usage section demonstrating how dRAGon can be imported as a crate and the server started from another package. Here's the updated usage section:
## Usage
### As a Library
dRAGon can be used as a library in your Rust projects. Here's how you can import and use it:
1. Add dRAGon to your `Cargo.toml`:
```toml
[dependencies]
dRAGon = { git = "https://github.com/portalcorp/dRAGon.git" }
```
2. In your Rust code, import and use dRAGon:
```rust
use dRAGon::start_server;
#[rocket::launch]
pub async fn run() -> _ {
start_server().await
}
```
This code snippet demonstrates how to start the dRAGon server from your own package. It uses the `start_server()` function from the dRAGon library to configure and launch the server.
### API Usage
Once the server is running, you can interact with it using HTTP requests. Here are some example API calls:
#### Adding Text
```bash
curl -X POST http://localhost:8000/add -H "Content-Type: application/json" -d '[{"text": "Hello, world!", "metadata": {"key": "value"}}]'
```
#### Adding a File
```bash
curl -X POST http://localhost:8000/add_file -F "file=@path/to/your/file.txt" -F "metadata[key]=value"
```
#### Performing a Similarity Search
```bash
curl -X POST http://localhost:8000/similarity_search -H "Content-Type: application/json" -d '{"text": "Query text", "top_k": 5}'
```
For more detailed API documentation, refer to the Swagger UI available at `/docs` when running the server.
## API Endpoints
- `POST /add`: Add text or a list of texts to the collection
- `POST /add_file`: Add text from a file to the collection
- `POST /similarity_search`: Perform a similarity search on the collection
- `GET /get_texts`: Retrieve texts by their IDs
- `POST /clear`: Clear the entire collection
For more detailed API documentation, refer to the Swagger UI available at `/docs` when running the server.
## Configuration
The dRAGon server can be configured using environment variables:
- `ROCKET_PORT`: The port on which the server will listen (default: 8000)
- `BASE_PATH`: The base path on which the server will listen (default: "/")
- `DB_PATH`: The path to store the database files (default: "data/lance")
- `COLLECTION_NAME`: The name of the default collection (default: "vectors")
- `EMBEDDING_MODEL`: The embedding model to use (default: "BGESmallENV15")
## Development
To set up the development environment:
1. Install Rust and Cargo: https://www.rust-lang.org/tools/install
2. Clone the repository and navigate to the project directory
3. Install dependencies: `cargo build`
4. Run the development server: `cargo run`
## Testing
To run the test suite:
```bash
cargo test -- --test-threads=1
```
We disable parallel database tests to avoid creating multiple temporary databases and increasing memories.
## License
dRAGon is released under the MIT License. See the [LICENSE](LICENSE) file for details.