polymathy 0.1.0

A high-performance web service that processes search queries, retrieves relevant content, and performs semantic chunking and embedding operations
# Polymathy: High-Performance Web Search Service

[![Crates.io](https://img.shields.io/crates/v/polymathy)](https://crates.io/crates/polymathy)
[![Documentation](https://docs.rs/polymathy/badge.svg)](https://docs.rs/polymathy)
[![License](https://img.shields.io/crates/l/polymathy)](https://github.com/sokratis-xyz/polymathy/blob/main/LICENSE)

A high-performance web service that processes search queries, retrieves relevant content, and performs semantic chunking and embedding operations. This is similar to what perplexity.ai does.

## 🌟 What Does Polymathy Do?

Polymathy is a web service that enhances traditional search by providing intelligent content processing capabilities:

1. **Search Enhancement**: Takes a user query and retrieves relevant results from SearxNG (a privacy-respecting metasearch engine)
2. **Content Processing**: For each search result, it fetches the content and breaks it down into semantic chunks
3. **AI-Powered Embeddings**: Generates vector embeddings for each chunk using machine learning models
4. **Intelligent Indexing**: Stores these embeddings in a vector database for similarity search
5. **Enhanced Results**: Returns not just links, but actual relevant content chunks with their sources

This approach allows users to get direct answers from search results rather than just links to potentially relevant pages.

## 🎯 Use Cases

- **Research Assistants**: Quickly extract relevant information from multiple sources
- **Content Summarization**: Get key points from multiple articles on a topic
- **Knowledge Base Building**: Create structured knowledge from unstructured web content
- **AI-Powered Search**: Enable semantic search across web content
- **Content Curation**: Automatically collect and organize information on specific topics

## 🚀 Getting Started

### Prerequisites

- Rust (latest stable version)
- SearxNG instance
- Content processor service
- Environment variables configuration

### Installation

#### As a Binary

You can install the `polymathy` binary directly using `cargo`:

```bash
cargo install --path .
```

This will install the `polymathy` binary in your `~/.cargo/bin` directory. You can then run it from anywhere.

#### As a Library

To use Polymathy as a library in your Rust project, add this to your `Cargo.toml`:

```toml
[dependencies]
polymathy = { git = "https://github.com/sokratis-xyz/polymathy" }
```

### Running from Source

1. Clone the repository:
```bash
git clone https://github.com/sokratis-xyz/polymathy.git
cd polymathy
```

2. Create a `.env` file in the project root with the following variables:
```env
SEARXNG_URL=http://your-searxng-instance/search
PROCESSOR_URL=http://your-processor-service/process
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
```

3. Build and run the project:
```bash
# Using cargo run (recommended)
cargo run --release

# Or explicitly specify the binary name
cargo run --bin polymathy --release
```

The service will be available at `http://localhost:8080` (or your configured host/port).

## 🔍 API Endpoints

### Search Endpoint

```
GET /v1/search?q={query}
```

Parameters:
- `q` (required): The search query string

Response: Returns a map of processed content chunks with their associated URLs and text.

Example response:
```json
{
  "0": ["https://example.com/article1", "This is a relevant chunk from the first article"],
  "1": ["https://example.com/article2", "This is another relevant chunk from a different article"]
}
```

## 📚 Documentation

### API Documentation

The API documentation is available at the following endpoints:

- Swagger UI: `/swagger`
- Redoc: `/redoc`
- RapiDoc: `/rapidoc`
- Scalar: `/scalar`
- OpenAPI JSON: `/openapi.json`

### Library Documentation

Rust documentation can be generated and viewed using:

```bash
cargo doc --open
```

## 🛠 Technical Details

### Architecture

The service is built using:
- Actix-web for the HTTP server
- USearch for vector similarity search
- Serde for serialization/deserialization
- Tokio for async runtime
- Reqwest for HTTP client operations

### Library Structure

Polymathy is organized into the following modules:

- `search`: Handles search query processing
- `content`: Manages content processing and chunking
- `embedding`: Handles embedding generation
- `index`: Manages vector indexing
- `api`: Defines API endpoints and server implementation

### Configuration

The service uses the following configuration options for content processing:
- Chunking size: 100 words
- Embedding model: AllMiniLML6V2
- Vector dimensions: 384
- Metric: Inner Product (IP)

## 🤝 Contributing

Contributions are welcome! Please feel free to submit pull requests.

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## 📧 Contact

support@sokratis.xyz