heroindex 0.1.0

A Tantivy-based indexing server with OpenRPC socket interface
# HeroIndex

[![Crates.io](https://img.shields.io/crates/v/heroindex.svg)](https://crates.io/crates/heroindex)
[![Documentation](https://docs.rs/heroindex/badge.svg)](https://docs.rs/heroindex)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A high-performance full-text search server built on [Tantivy](https://github.com/quickwit-oss/tantivy), exposing an OpenRPC interface over Unix sockets.

## Features

- **Multiple Index Management** - Create, delete, and manage multiple search indexes
- **Dynamic Schemas** - Define custom schemas with 10+ field types
- **Powerful Queries** - Full-text, fuzzy, phrase, boolean, range, regex queries
- **OpenRPC Discovery** - Self-documenting API via `rpc.discover`
- **Concurrent Connections** - Handle multiple clients simultaneously
- **Fast Fields** - Columnar storage for sorting and aggregations
- **Zero-Copy Search** - Efficient memory-mapped index files

## Installation

### From crates.io

```bash
cargo install heroindex
```

### From source

```bash
git clone https://github.com/heroindex/heroindex
cd heroindex
cargo build --release
```

## Quick Start

### 1. Start the Server

```bash
heroindex --dir /var/lib/heroindex --socket /tmp/heroindex.sock
```

### 2. Connect with the Client Library

Use [heroindex_client](https://crates.io/crates/heroindex_client) to connect:

```rust
use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), heroindex_client::Error> {
    let mut client = HeroIndexClient::connect("/tmp/heroindex.sock").await?;
    
    // Create an index
    client.db_create("articles", json!({
        "fields": [
            {"name": "title", "type": "text", "stored": true, "indexed": true},
            {"name": "body", "type": "text", "stored": true, "indexed": true}
        ]
    })).await?;
    
    // Add documents
    client.db_select("articles").await?;
    client.doc_add(json!({"title": "Hello", "body": "World"})).await?;
    client.commit().await?;
    client.reload().await?;
    
    // Search
    let results = client.search(
        json!({"type": "match", "field": "body", "value": "world"}),
        10, 0
    ).await?;
    
    println!("Found {} results", results.total_hits);
    Ok(())
}
```

## Command Line Options

```
heroindex [OPTIONS]

Options:
  -d, --dir <DIR>        Base directory for all indexes
  -s, --socket <SOCKET>  Unix socket path for RPC interface
  -h, --help             Print help
  -V, --version          Print version
```

## Schema Definition

Define your index schema with these field types:

| Type | Description | Options |
|------|-------------|---------|
| `text` | Full-text searchable (tokenized) | `stored`, `indexed`, `fast`, `tokenizer` |
| `str` | Exact match string (keyword) | `stored`, `indexed`, `fast` |
| `u64` | Unsigned 64-bit integer | `stored`, `indexed`, `fast` |
| `i64` | Signed 64-bit integer | `stored`, `indexed`, `fast` |
| `f64` | 64-bit floating point | `stored`, `indexed`, `fast` |
| `date` | DateTime (RFC 3339) | `stored`, `indexed`, `fast` |
| `bool` | Boolean | `stored`, `indexed`, `fast` |
| `json` | JSON object | `stored`, `indexed` |
| `bytes` | Binary data | `stored`, `indexed`, `fast` |
| `ip` | IP address | `stored`, `indexed`, `fast` |

### Example Schema

```json
{
  "fields": [
    {"name": "id", "type": "str", "stored": true, "indexed": true},
    {"name": "title", "type": "text", "stored": true, "indexed": true, "tokenizer": "en_stem"},
    {"name": "content", "type": "text", "stored": true, "indexed": true},
    {"name": "views", "type": "u64", "stored": true, "indexed": true, "fast": true},
    {"name": "rating", "type": "f64", "stored": true, "indexed": true, "fast": true},
    {"name": "published", "type": "date", "stored": true, "indexed": true, "fast": true},
    {"name": "active", "type": "bool", "stored": true, "indexed": true},
    {"name": "metadata", "type": "json", "stored": true, "indexed": true}
  ]
}
```

## Query Types

### Match Query (Full-Text)
```json
{"type": "match", "field": "content", "value": "search terms"}
```

### Term Query (Exact)
```json
{"type": "term", "field": "id", "value": "abc123"}
```

### Fuzzy Query (Typo-Tolerant)
```json
{"type": "fuzzy", "field": "title", "value": "serch", "distance": 2}
```

### Phrase Query
```json
{"type": "phrase", "field": "content", "value": "exact phrase match"}
```

### Prefix Query
```json
{"type": "prefix", "field": "title", "value": "hel"}
```

### Range Query
```json
{"type": "range", "field": "views", "gte": 100, "lt": 1000}
```

### Regex Query
```json
{"type": "regex", "field": "title", "value": "test.*"}
```

### Boolean Query
```json
{
  "type": "boolean",
  "must": [{"type": "match", "field": "content", "value": "rust"}],
  "should": [{"type": "match", "field": "title", "value": "tutorial"}],
  "must_not": [{"type": "term", "field": "status", "value": "draft"}]
}
```

## RPC Methods

| Method | Description |
|--------|-------------|
| `rpc.discover` | Get OpenRPC schema |
| `server.ping` | Health check |
| `server.stats` | Server statistics |
| `db.list` | List all databases |
| `db.create` | Create database with schema |
| `db.delete` | Delete a database |
| `db.select` | Select database for operations |
| `db.info` | Get database info |
| `schema.get` | Get current schema |
| `doc.add` | Add single document |
| `doc.add_batch` | Add multiple documents |
| `doc.delete` | Delete by term |
| `index.commit` | Commit changes |
| `index.reload` | Reload to see changes |
| `search.query` | Execute search |
| `search.count` | Count matches |

## Performance Tips

1. **Use batch inserts** - `doc.add_batch` is much faster than individual adds
2. **Commit periodically** - Don't commit after every document
3. **Enable fast fields** - For fields used in sorting/filtering
4. **Use appropriate tokenizers** - `en_stem` for English, `raw` for keywords

## Related Crates

- [heroindex_client]https://crates.io/crates/heroindex_client - Client library for connecting to HeroIndex

## License

MIT License - see [LICENSE](LICENSE) for details.

## Credits

Built on the excellent [Tantivy](https://github.com/quickwit-oss/tantivy) search engine library.