Trait BiEncoder

Source

pub trait BiEncoder: Send + Sync {
    // Required methods
    fn text_encoder(&self) -> &dyn TextEncoder;
    fn label_encoder(&self) -> &dyn LabelEncoder;
    fn encode_and_match(
        &self,
        text: &str,
        labels: &[&str],
        max_span_width: usize,
    ) -> Result<Vec<SpanLabelScore>>;
}

Expand description

Bi-encoder architecture combining text and label encoders.

§Motivation

The bi-encoder architecture treats NER as a matching problem rather than a classification problem. It encodes text spans and entity labels separately, then computes similarity scores to determine matches.

┌─────────────────┐         ┌─────────────────┐
│   Text Input    │         │  Label Desc.    │
│ "Steve Jobs"    │         │ "person name"   │
└────────┬────────┘         └────────┬────────┘
         │                           │
         ▼                           ▼
┌─────────────────┐         ┌─────────────────┐
│  TextEncoder    │         │  LabelEncoder   │
│  (ModernBERT)   │         │  (BGE-small)    │
└────────┬────────┘         └────────┬────────┘
         │                           │
         ▼                           ▼
┌─────────────────┐         ┌─────────────────┐
│ Span Embedding  │◄───────►│ Label Embedding │
│   [768]         │ cosine  │   [768]         │
└─────────────────┘ sim     └─────────────────┘
                     │
                     ▼
              Score: 0.92

§Trade-offs

Aspect	Bi-Encoder	Uni-Encoder
Entity types	Unlimited	Fixed at training
Inference speed	Faster (pre-compute labels)	Slower
Disambiguation	Harder (no label interaction)	Better
Generalization	Better to new types	Limited

§Research Alignment

From GLiNER: “GLiNER frames NER as a matching problem, comparing candidate spans with entity type embeddings.”

From knowledgator: “Bi-encoder architecture brings several advantages… unlimited entities, faster inference, better generalization.”

Drawback: “Lack of inter-label interactions that make it hard to disambiguate semantically similar but contextually different entities.”

§Example

use anno::BiEncoder;

fn extract_custom_entities(bi_enc: &dyn BiEncoder, text: &str) {
    let labels = &["software company", "hardware manufacturer", "person"];
    let scores = bi_enc.encode_and_match(text, labels, 8).unwrap();

    for s in scores.iter().filter(|s| s.score > 0.5) {
        println!("Found '{}' as type {} (score: {:.2})",
                 &text[s.start..s.end], labels[s.label_idx], s.score);
    }
}