Crate rbert

source ·
Expand description

§rbert

A Rust wrapper for bert sentence transformers implemented in Candle

§Usage

use kalosm_language_model::Embedder;
use rbert::*;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut bert = Bert::new().await?;
    let sentences = [
        "Cats are cool",
        "The geopolitical situation is dire",
        "Pets are great",
        "Napoleon was a tyrant",
        "Napoleon was a great general",
    ];
    let embeddings = bert.embed_batch(sentences).await?;
    println!("embeddings {:?}", embeddings);

    // Find the cosine similarity between the first two sentences
    let mut similarities = vec![];
    let n_sentences = sentences.len();
    for (i, e_i) in embeddings.iter().enumerate() {
        for j in (i + 1)..n_sentences {
            let e_j = embeddings.get(j).unwrap();
            let cosine_similarity = e_j.cosine_similarity(e_i);
            similarities.push((cosine_similarity, i, j))
        }
    }
    similarities.sort_by(|u, v| v.0.total_cmp(&u.0));
    for &(score, i, j) in similarities.iter() {
        println!("score: {score:.2} '{}' '{}'", sentences[i], sentences[j])
    }

    Ok(())
}

Re-exports§

  • pub use crate::Bert;

Structs§

Enums§

  • The type of embedding the model should output. For models that output different embeddings for queries and documents, this
  • The pooling strategy to use when embedding text.

Traits§

  • A model that can be used to embed text. This trait is generic over the vector space that the model uses to help keep track of what embeddings came from which model.
  • An extension trait for Embedder that allows for caching embeddings.
  • An extension trait for Embedder with helper methods for iterators, and types that can be converted into a string.
  • A builder that can create a model asynchronously.
  • The type of a vector space marks what model the vector space is from. You should only combine vector spaces that come from the same model.