dragon_db 0.1.0

A tiny embedding database in pure Rust.
Documentation
# dRAGon

dRAGon is an embedded Vector database in Rust with helper functions for Retrieval-Augmented Generation (RAG). It provides a powerful and flexible solution for managing and querying vector embeddings, making it ideal for various natural language processing and machine learning applications.

## Features

- 🚀 Efficient vector storage and retrieval using [LanceDB]https://github.com/lancedb/lancedb
- 📄 Support for multiple file formats (PDF, TXT, DOCX, CSV)
- 🔍 Similarity search functionality
- 📦 Batch operations for adding and updating vectors
- 🧠 Customizable embedding model and inference via [fastembed-rs]https://github.com/Anush008/fastembed-rs
- 🌐 RESTful API for easy integration

## Table of Contents

- [Installation]#installation
- [Usage]#usage
- [API Endpoints]#api-endpoints
- [Configuration]#configuration
- [Development]#development
- [Testing]#testing
- [License]#license

## Installation

### Prerequisites

- Rust
- `protoc` (Protocol Buffers Compiler)

### Installing `protoc`

#### MacOS

```bash
brew install protobuf
```

#### Linux

```bash
sudo apt-get install protobuf-compiler
```

#### Windows

```powershell
choco install protobuf
```

or

```powershell
scoop install protobuf
```

### Building from Source

1. Clone the repository:

   ```bash
   git clone https://github.com/portalcorp/dRAGon.git
   cd dRAGon
   ```
2. Build the project:

   ```bash
   cargo build --release
   ```
3. Run the server:

   ```bash
   cargo run
   ```

Certainly! I'll create a new usage section demonstrating how dRAGon can be imported as a crate and the server started from another package. Here's the updated usage section:

## Usage

### As a Library

dRAGon can be used as a library in your Rust projects. Here's how you can import and use it:

1. Add dRAGon to your `Cargo.toml`:

```toml
[dependencies]
dRAGon = { git = "https://github.com/portalcorp/dRAGon.git" }
```

2. In your Rust code, import and use dRAGon:

```rust
use dRAGon::start_server;

#[rocket::launch]
pub async fn run() -> _ {
	start_server().await
}
```

This code snippet demonstrates how to start the dRAGon server from your own package. It uses the `start_server()` function from the dRAGon library to configure and launch the server.

### API Usage

Once the server is running, you can interact with it using HTTP requests. Here are some example API calls:

#### Adding Text

```bash
curl -X POST http://localhost:8000/add -H "Content-Type: application/json" -d '[{"text": "Hello, world!", "metadata": {"key": "value"}}]'
```

#### Adding a File

```bash
curl -X POST http://localhost:8000/add_file -F "file=@path/to/your/file.txt" -F "metadata[key]=value"
```

#### Performing a Similarity Search

```bash
curl -X POST http://localhost:8000/similarity_search -H "Content-Type: application/json" -d '{"text": "Query text", "top_k": 5}'
```

For more detailed API documentation, refer to the Swagger UI available at `/docs` when running the server.

## API Endpoints

- `POST /add`: Add text or a list of texts to the collection
- `POST /add_file`: Add text from a file to the collection
- `POST /similarity_search`: Perform a similarity search on the collection
- `GET /get_texts`: Retrieve texts by their IDs
- `POST /clear`: Clear the entire collection

For more detailed API documentation, refer to the Swagger UI available at `/docs` when running the server.

## Configuration

The dRAGon server can be configured using environment variables:

- `ROCKET_PORT`: The port on which the server will listen (default: 8000)
- `BASE_PATH`: The base path on which the server will listen (default: "/")
- `DB_PATH`: The path to store the database files (default: "data/lance")
- `COLLECTION_NAME`: The name of the default collection (default: "vectors")
- `EMBEDDING_MODEL`: The embedding model to use (default: "BGESmallENV15")

## Development

To set up the development environment:

1. Install Rust and Cargo: https://www.rust-lang.org/tools/install
2. Clone the repository and navigate to the project directory
3. Install dependencies: `cargo build`
4. Run the development server: `cargo run`

## Testing

To run the test suite:

```bash
cargo test -- --test-threads=1
```

We disable parallel database tests to avoid creating multiple temporary databases and increasing memories.

## License

dRAGon is released under the MIT License. See the [LICENSE](LICENSE) file for details.