dim
dim
is a Rust library for flexible and extensible vectorization of different types of data (images, text, etc.) using Large Language Models (LLMs). It allows concurrent processing of multiple prompts to generate meaningful vector representations.
Features
- Support for multiple data types (Image, and Text for now. Other data formats in the future)
- Concurrent vectorization using multiple prompts
- Compatible with OpenAI API format. You may use Ollama API or so as a drop-in replacement
- Flexible vector dimension control through prompt design
- Built-in validation for vectorization results
Installation
Add this to your Cargo.toml
:
[]
= "0.2.0"
Quick Start
Vectorize Text
use *;
use async_openai;
async
The result should be something like this:
Vector:
Notice that each prompt generates a value between 0.0 and 10.0. The final vector is a combination of these values.
Vectorize Images
use *;
use DynamicImage;
use async_openai;
async
Once again, the result should be something like this:
Vector:
Notice that each prompt generates a value between 0.0 and 10.0. The final vector is a combination of these values.
How It Works
- The library takes your data (text/image) and creates a
Vector
object - You provide multiple prompts that will be used to analyze different aspects of the data
- The prompts are processed concurrently using the specified LLM
- Results are combined into a single vector representation
- The dimensionality of the final vector is determined by the number of prompts and their specified outputs
Configuration
- Works with OpenAI API style. Also, this project uses
async_openai
for API calls. - Customize API endpoint using:
.with_api_base
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.