Expand description
§Sentence Embeddings pipeline
Compute sentence/text embeddings that can be compared (e.g. with cosine-similarity) to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.
The implementation is based on Sentence-Transformers and pretrained models
available on Hugging Face Hub can be used. It’s however necessary to
convert them using the script utils/convert_model.py beforehand, see
tests/sentence_embeddings.rs for such examples.
Basic usage is as follows:
use rust_bert::pipelines::sentence_embeddings::SentenceEmbeddingsBuilder;
let model = SentenceEmbeddingsBuilder::local("local/path/to/distiluse-base-multilingual-cased")
.with_device(tch::Device::cuda_if_available())
.create_model()?;
let sentences = ["This is an example sentence", "Each sentence is converted"];
let embeddings = model.encode(&sentences)?;Re-exports§
pub use builder::SentenceEmbeddingsBuilder;
Modules§
Structs§
- Sentence
Embeddings Config - Configuration for sentence embeddings
- Sentence
Embeddings Config Resources - Pretrained config files for sentence embeddings
- Sentence
Embeddings Dense Config Resources - Pretrained dense config files for sentence embeddings
- Sentence
Embeddings Dense Resources - Pretrained dense weights files for sentence embeddings
- Sentence
Embeddings Model - SentenceEmbeddingsModel to perform sentence embeddings
- Sentence
Embeddings Model Output - Container for the SentenceEmbeddings model output.
- Sentence
Embeddings Module Config - Configuration defining a single module (model’s layer)
- Sentence
Embeddings Modules Config - Configuration for the modules that define the model’s layers
- Sentence
Embeddings Modules Config Resources - Pretrained config files for sentence embeddings
- Sentence
Embeddings Pooling Config Resources - Pretrained pooling config files for sentence embeddings
- Sentence
Embeddings Sentence Bert Config - Configuration for Sentence-Transformers specific parameters
- Sentence
Embeddings Tokenizer Config - Configuration for transformer’s tokenizer
- Sentence
Embeddings Tokenizer Config Resources - Pretrained tokenizer config files for sentence embeddings
- Sentence
Embeddings Tokenizer Output - Container for the SentenceEmbeddings tokenizer output.
Enums§
- Sentence
Embeddings Model Type - Sentence
Embeddings Module Type - Available module types, based on Sentence-Transformers
- Sentence
Embeddings Option - Abstraction that holds one particular sentence embeddings model, for any of the supported models
Type Aliases§
- Attention
- Length = sequence length
- Attention
Head - Length = sequence length
- Attention
Layer - Length = number of heads per attention layer
- Attention
Output - Length = number of attention layers
- Embedding