Module sentence_embeddings

Source
Expand description

§Sentence Embeddings pipeline

Compute sentence/text embeddings that can be compared (e.g. with cosine-similarity) to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining.

The implementation is based on Sentence-Transformers and pretrained models available on Hugging Face Hub can be used. It’s however necessary to convert them using the script utils/convert_model.py beforehand, see tests/sentence_embeddings.rs for such examples.

Basic usage is as follows:

use rust_bert::pipelines::sentence_embeddings::SentenceEmbeddingsBuilder;

let model = SentenceEmbeddingsBuilder::local("local/path/to/distiluse-base-multilingual-cased")
    .with_device(tch::Device::cuda_if_available())
    .create_model()?;

let sentences = ["This is an example sentence", "Each sentence is converted"];
let embeddings = model.encode(&sentences)?;

Re-exports§

pub use builder::SentenceEmbeddingsBuilder;

Modules§

builder
layers

Structs§

SentenceEmbeddingsConfig
Configuration for sentence embeddings
SentenceEmbeddingsConfigResources
Pretrained config files for sentence embeddings
SentenceEmbeddingsDenseConfigResources
Pretrained dense config files for sentence embeddings
SentenceEmbeddingsDenseResources
Pretrained dense weights files for sentence embeddings
SentenceEmbeddingsModel
SentenceEmbeddingsModel to perform sentence embeddings
SentenceEmbeddingsModelOutput
Container for the SentenceEmbeddings model output.
SentenceEmbeddingsModuleConfig
Configuration defining a single module (model’s layer)
SentenceEmbeddingsModulesConfig
Configuration for the modules that define the model’s layers
SentenceEmbeddingsModulesConfigResources
Pretrained config files for sentence embeddings
SentenceEmbeddingsPoolingConfigResources
Pretrained pooling config files for sentence embeddings
SentenceEmbeddingsSentenceBertConfig
Configuration for Sentence-Transformers specific parameters
SentenceEmbeddingsTokenizerConfig
Configuration for transformer’s tokenizer
SentenceEmbeddingsTokenizerConfigResources
Pretrained tokenizer config files for sentence embeddings
SentenceEmbeddingsTokenizerOutput
Container for the SentenceEmbeddings tokenizer output.

Enums§

SentenceEmbeddingsModelType
SentenceEmbeddingsModuleType
Available module types, based on Sentence-Transformers
SentenceEmbeddingsOption
Abstraction that holds one particular sentence embeddings model, for any of the supported models

Type Aliases§

Attention
Length = sequence length
AttentionHead
Length = sequence length
AttentionLayer
Length = number of heads per attention layer
AttentionOutput
Length = number of attention layers
Embedding