Expand description
§rbert
A Rust wrapper for bert sentence transformers implemented in Candle
§Usage
use kalosm_language_model::Embedder;
use rbert::*;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let mut bert = Bert::new().await?;
let sentences = [
"Cats are cool",
"The geopolitical situation is dire",
"Pets are great",
"Napoleon was a tyrant",
"Napoleon was a great general",
];
let embeddings = bert.embed_batch(sentences).await?;
println!("embeddings {:?}", embeddings);
// Find the cosine similarity between the first two sentences
let mut similarities = vec![];
let n_sentences = sentences.len();
for (i, e_i) in embeddings.iter().enumerate() {
for j in (i + 1)..n_sentences {
let e_j = embeddings.get(j).unwrap();
let cosine_similarity = e_j.cosine_similarity(e_i);
similarities.push((cosine_similarity, i, j))
}
}
similarities.sort_by(|u, v| v.0.total_cmp(&u.0));
for &(score, i, j) in similarities.iter() {
println!("score: {score:.2} '{}' '{}'", sentences[i], sentences[j])
}
Ok(())
}
Re-exports§
pub use crate::Bert;
Structs§
- A bert embedding model. The main interface for this model is
EmbedderExt
. - A builder for a
Bert
model - A raw synchronous Bert model. You should generally use the
super::Bert
instead. - A the source of a
crate::Bert
model - A vector space for BERT sentence embeddings.
- The configuration of a
BertModel
. - Embeddings
- The input to an embedding model. This includes the text to be embedded and the type of embedding to output.
Enums§
- The type of embedding the model should output. For models that output different embeddings for queries and documents, this
- The pooling strategy to use when embedding text.
Traits§
- A model that can be used to embed text. This trait is generic over the vector space that the model uses to help keep track of what embeddings came from which model.
- An extension trait for
Embedder
that allows for caching embeddings. - An extension trait for
Embedder
with helper methods for iterators, and types that can be converted into a string. - A builder that can create a model asynchronously.
- The type of a vector space marks what model the vector space is from. You should only combine vector spaces that come from the same model.