Module sentence_embeddings

Expand description

§Sentence Embeddings pipeline

Compute sentence/text embeddings that can be compared (e.g. with cosine-similarity) to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.

The implementation is based on Sentence-Transformers and pretrained models available on Hugging Face Hub can be used. It’s however necessary to convert them using the script utils/convert_model.py beforehand, see tests/sentence_embeddings.rs for such examples.

Basic usage is as follows:

use rust_bert::pipelines::sentence_embeddings::SentenceEmbeddingsBuilder;

let model = SentenceEmbeddingsBuilder::local("local/path/to/distiluse-base-multilingual-cased")
    .with_device(tch::Device::cuda_if_available())
    .create_model()?;

let sentences = ["This is an example sentence", "Each sentence is converted"];
let embeddings = model.encode(&sentences)?;

Re-exports§

pub use builder::SentenceEmbeddingsBuilder;

Modules§

builder
layers

Structs§

SentenceEmbeddingsConfig: Configuration for sentence embeddings
SentenceEmbeddingsConfigResources: Pretrained config files for sentence embeddings
SentenceEmbeddingsDenseConfigResources: Pretrained dense config files for sentence embeddings
SentenceEmbeddingsDenseResources: Pretrained dense weights files for sentence embeddings
SentenceEmbeddingsModel: SentenceEmbeddingsModel to perform sentence embeddings
SentenceEmbeddingsModelOutput: Container for the SentenceEmbeddings model output.
SentenceEmbeddingsModuleConfig: Configuration defining a single module (model’s layer)
SentenceEmbeddingsModulesConfig: Configuration for the modules that define the model’s layers
SentenceEmbeddingsModulesConfigResources: Pretrained config files for sentence embeddings
SentenceEmbeddingsPoolingConfigResources: Pretrained pooling config files for sentence embeddings
SentenceEmbeddingsSentenceBertConfig: Configuration for Sentence-Transformers specific parameters
SentenceEmbeddingsTokenizerConfig: Configuration for transformer’s tokenizer
SentenceEmbeddingsTokenizerConfigResources: Pretrained tokenizer config files for sentence embeddings
SentenceEmbeddingsTokenizerOutput: Container for the SentenceEmbeddings tokenizer output.

Enums§

SentenceEmbeddingsModelType
SentenceEmbeddingsModuleType: Available module types, based on Sentence-Transformers
SentenceEmbeddingsOption: Abstraction that holds one particular sentence embeddings model, for any of the supported models

Type Aliases§

Attention: Length = sequence length
AttentionHead: Length = sequence length
AttentionLayer: Length = number of heads per attention layer
AttentionOutput: Length = number of attention layers
Embedding

Module sentence_embeddingsCopy item path