glowrs
Library Usage
The glowrs library provides an easy and familiar interface to use pre-trained models for embeddings and sentence similarity.
Example
use SentenceTransformer;
Features
- Load models from Hugging Face Hub
- More to come!
Server Usage
glowrs also provides a web server for sentence embedding inference. Uses
candle as Tensor framework. It currently supports Bert type models hosted on Huggingface, such as those provided by
sentence-transformers,
Tom Aarsen, or Jina AI, as long as they provide safetensors model weights.
Example usage with the jina-embeddings-v2-base-en model:
If you want to use a certain revision of the model, you can append it to the repository name like so.
The SentenceTransformer will attempt to infer the model type from the model name. If it fails, you can specify the model type like so:
Currently bert and jinabert are supported.
If you want to run multiple models, you can run multiple instances of the server with different model repos.
Warning: This is not supported with metal acceleration for now.
Instructions:
Usage: server [OPTIONS]
Options:
-m, --model-repo <MODEL_REPO>
-r, --revision <REVISION> [default: main]
-h, --help Print help
Build features
metal: Compile with Metal accelerationcuda: Compile with CUDA accelerationaccelerate: Compile with Accelerate framework acceleration (CPU)
Features
- OpenAI API compatible (
/v1/embeddings) REST API endpoint -
candleinference for bert and jina-bert models - Hardware acceleration (Metal for now)
- Queueing
- Multiple models
- Batching
- Performance metrics
curl
curl -X POST http://localhost:3000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": ["The food was delicious and the waiter...", "was too"],
"model": "jina-embeddings-v2-base-en",
"encoding_format": "float"
}'
Python openai client
Install the OpenAI Python library:
pip install openai
Use the embeddings method regularly.
=
=
# List models
Details
- Use
TOKIO_WORKER_THREADSto set the number of threads per queue.
Disclaimer
This is still a work-in-progress. The embedding performance is decent but can probably do with some benchmarking. Furthermore, for higher batch sizes, the program is killed due to a bug.
Do not use this in a production environment.