Skip to main content

BiEncoder

Trait BiEncoder 

Source
pub trait BiEncoder: Send + Sync {
    // Required methods
    fn text_encoder(&self) -> &dyn TextEncoder;
    fn label_encoder(&self) -> &dyn LabelEncoder;
    fn encode_and_match(
        &self,
        text: &str,
        labels: &[&str],
        max_span_width: usize,
    ) -> Result<Vec<SpanLabelScore>>;
}
Expand description

Bi-encoder architecture combining text and label encoders.

§Motivation

The bi-encoder architecture treats NER as a matching problem rather than a classification problem. It encodes text spans and entity labels separately, then computes similarity scores to determine matches.

┌─────────────────┐         ┌─────────────────┐
│   Text Input    │         │  Label Desc.    │
│ "Steve Jobs"    │         │ "person name"   │
└────────┬────────┘         └────────┬────────┘
         │                           │
         ▼                           ▼
┌─────────────────┐         ┌─────────────────┐
│  TextEncoder    │         │  LabelEncoder   │
│  (ModernBERT)   │         │  (BGE-small)    │
└────────┬────────┘         └────────┬────────┘
         │                           │
         ▼                           ▼
┌─────────────────┐         ┌─────────────────┐
│ Span Embedding  │◄───────►│ Label Embedding │
│   [768]         │ cosine  │   [768]         │
└─────────────────┘ sim     └─────────────────┘
                     │
                     ▼
              Score: 0.92

§Trade-offs

AspectBi-EncoderUni-Encoder
Entity typesUnlimitedFixed at training
Inference speedFaster (pre-compute labels)Slower
DisambiguationHarder (no label interaction)Better
GeneralizationBetter to new typesLimited

§Research Alignment

From GLiNER: “GLiNER frames NER as a matching problem, comparing candidate spans with entity type embeddings.”

From knowledgator: “Bi-encoder architecture brings several advantages… unlimited entities, faster inference, better generalization.”

Drawback: “Lack of inter-label interactions that make it hard to disambiguate semantically similar but contextually different entities.”

§Example

use anno::BiEncoder;

fn extract_custom_entities(bi_enc: &dyn BiEncoder, text: &str) {
    let labels = &["software company", "hardware manufacturer", "person"];
    let scores = bi_enc.encode_and_match(text, labels, 8).unwrap();

    for s in scores.iter().filter(|s| s.score > 0.5) {
        println!("Found '{}' as type {} (score: {:.2})",
                 &text[s.start..s.end], labels[s.label_idx], s.score);
    }
}

Required Methods§

Source

fn text_encoder(&self) -> &dyn TextEncoder

Get the text encoder.

Source

fn label_encoder(&self) -> &dyn LabelEncoder

Get the label encoder.

Source

fn encode_and_match( &self, text: &str, labels: &[&str], max_span_width: usize, ) -> Result<Vec<SpanLabelScore>>

Encode text and labels, compute span-label similarities.

§Arguments
  • text - Input text
  • labels - Entity type descriptions
  • max_span_width - Maximum span width to consider
§Returns

Similarity scores for each (span, label) pair

Implementors§