polymathy 0.1.0

A high-performance web service that processes search queries, retrieves relevant content, and performs semantic chunking and embedding operations
docs.rs failed to build polymathy-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

Polymathy: High-Performance Web Search Service

Crates.io Documentation License

A high-performance web service that processes search queries, retrieves relevant content, and performs semantic chunking and embedding operations. This is similar to what perplexity.ai does.

🌟 What Does Polymathy Do?

Polymathy is a web service that enhances traditional search by providing intelligent content processing capabilities:

  1. Search Enhancement: Takes a user query and retrieves relevant results from SearxNG (a privacy-respecting metasearch engine)
  2. Content Processing: For each search result, it fetches the content and breaks it down into semantic chunks
  3. AI-Powered Embeddings: Generates vector embeddings for each chunk using machine learning models
  4. Intelligent Indexing: Stores these embeddings in a vector database for similarity search
  5. Enhanced Results: Returns not just links, but actual relevant content chunks with their sources

This approach allows users to get direct answers from search results rather than just links to potentially relevant pages.

🎯 Use Cases

  • Research Assistants: Quickly extract relevant information from multiple sources
  • Content Summarization: Get key points from multiple articles on a topic
  • Knowledge Base Building: Create structured knowledge from unstructured web content
  • AI-Powered Search: Enable semantic search across web content
  • Content Curation: Automatically collect and organize information on specific topics

🚀 Getting Started

Prerequisites

  • Rust (latest stable version)
  • SearxNG instance
  • Content processor service
  • Environment variables configuration

Installation

As a Binary

You can install the polymathy binary directly using cargo:

cargo install --path .

This will install the polymathy binary in your ~/.cargo/bin directory. You can then run it from anywhere.

As a Library

To use Polymathy as a library in your Rust project, add this to your Cargo.toml:

[dependencies]
polymathy = { git = "https://github.com/sokratis-xyz/polymathy" }

Running from Source

  1. Clone the repository:
git clone https://github.com/sokratis-xyz/polymathy.git
cd polymathy
  1. Create a .env file in the project root with the following variables:
SEARXNG_URL=http://your-searxng-instance/search
PROCESSOR_URL=http://your-processor-service/process
SERVER_HOST=127.0.0.1
SERVER_PORT=8080
  1. Build and run the project:
# Using cargo run (recommended)
cargo run --release

# Or explicitly specify the binary name
cargo run --bin polymathy --release

The service will be available at http://localhost:8080 (or your configured host/port).

🔍 API Endpoints

Search Endpoint

GET /v1/search?q={query}

Parameters:

  • q (required): The search query string

Response: Returns a map of processed content chunks with their associated URLs and text.

Example response:

{
  "0": ["https://example.com/article1", "This is a relevant chunk from the first article"],
  "1": ["https://example.com/article2", "This is another relevant chunk from a different article"]
}

📚 Documentation

API Documentation

The API documentation is available at the following endpoints:

  • Swagger UI: /swagger
  • Redoc: /redoc
  • RapiDoc: /rapidoc
  • Scalar: /scalar
  • OpenAPI JSON: /openapi.json

Library Documentation

Rust documentation can be generated and viewed using:

cargo doc --open

🛠 Technical Details

Architecture

The service is built using:

  • Actix-web for the HTTP server
  • USearch for vector similarity search
  • Serde for serialization/deserialization
  • Tokio for async runtime
  • Reqwest for HTTP client operations

Library Structure

Polymathy is organized into the following modules:

  • search: Handles search query processing
  • content: Manages content processing and chunking
  • embedding: Handles embedding generation
  • index: Manages vector indexing
  • api: Defines API endpoints and server implementation

Configuration

The service uses the following configuration options for content processing:

  • Chunking size: 100 words
  • Embedding model: AllMiniLML6V2
  • Vector dimensions: 384
  • Metric: Inner Product (IP)

🤝 Contributing

Contributions are welcome! Please feel free to submit pull requests.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📧 Contact

support@sokratis.xyz