anno
Extract named entities, relations, coreference chains, and PII from unstructured text. Fixed entity types (PER/ORG/LOC/MISC) or zero-shot custom labels.
Dual-licensed under MIT or Apache-2.0. MSRV: 1.85.
Quickstart
[]
= "0.3.9"
let entities = extract?;
for e in &entities
// Sophie Wilson [PER] (0,13) 0.95
// ARM [ORG] (27,30) 0.90
# Ok::
Filter results with prelude:
use *;
let people: = entities.of_type.collect;
let confident: = entities.above_confidence.collect;
For backend control, construct a model directly:
use ;
let m = default;
let ents = m.extract_entities?;
# Ok::
StackedNER::default() selects the best available backend at runtime: BERT or NuNER (if onnx enabled and models cached), then GLiNER, falling back to heuristic + pattern extraction. Set ANNO_NO_DOWNLOADS=1 or HF_HUB_OFFLINE=1 to force cached-only behavior.
Zero-shot custom types via GLiNER:
use GLiNEROnnx;
let m = new?;
let ents = m.extract?;
for e in &ents
// drug: Aspirin
// symptom: headaches
# Ok::
Custom backends
AnyModel wraps a closure into a Model, bypassing the sealed trait when you need to plug in an external NER system:
use ;
let model = new;
let ents = model.extract_entities?;
# Ok::
What it does
Named entity recognition. Spans (start, end, type, confidence) with character offsets (Unicode scalar values, not bytes). Fixed taxonomies (PER/ORG/LOC/MISC) or caller-defined labels for zero-shot extraction [1, 2].
Coreference resolution. Group mentions into clusters tracking the same referent. Rule-based sieves (SimpleCorefResolver), neural (FCoref, 78.5 F1 on CoNLL-2012 [3]), and mention-ranking (MentionRankingCoref).
Structured patterns. Dates, monetary amounts, emails, URLs, phone numbers via deterministic regex grammars.
Relation extraction. (head, relation, tail) triples via RelationCapable backends (gliner2, tplinker). Other backends produce co-occurrence edges for graph export.
PII detection. Classify NER entities as PII and scan for structured patterns (SSN, credit card, IBAN, email, phone). Redact or pseudonymize in one call:
use ;
let text = "John Smith's SSN is 123-45-6789.";
let m = default;
let ents = m.extract_entities?;
let mut pii_ents: = ents.iter.filter_map.collect;
pii_ents.extend;
let redacted = redact;
// "[REDACTED]'s SSN is [REDACTED]."
# Ok::
Export. Brat standoff, CoNLL BIO tags, JSONL, N-Triples, JSON-LD, and graph CSV via pure functions in anno::export.
Backends
| Backend | Feature | Zero-shot | Status | Reference |
|---|---|---|---|---|
stacked (default) |
-- | -- | stable | -- |
gliner |
onnx |
Yes | stable | Zaratiana et al. [5] |
gliner2 |
onnx |
Yes | beta | [11] |
nuner |
onnx |
Yes | stable | Bogdanov et al. [6] |
bert_onnx |
onnx |
No | beta | Devlin et al. [8] |
w2ner |
onnx |
No | beta | Li et al. [7] |
tplinker |
onnx |
No | beta | Wang et al. [10] |
glirel |
onnx |
Yes | beta | -- |
gliner_poly |
onnx |
Yes | beta | -- |
gliner_candle |
candle |
Yes | beta | -- |
candle_ner |
candle |
No | beta | -- |
pattern |
-- | N/A | stable | -- |
heuristic |
-- | No | stable | -- |
crf |
-- | No | stable | Lafferty et al. [9] |
hmm |
-- | No | stable | Rabiner [12] |
ensemble |
-- | No | beta | -- |
bilstm_crf |
-- | No | beta | -- |
universal_ner |
llm |
Yes | beta | -- |
See BACKENDS.md for details, default models, and WIP backends.
ML backends are feature-gated (onnx or candle). Weights download from HuggingFace on first use.
Feature flags
| Feature | Default | Description |
|---|---|---|
onnx |
Yes | ONNX Runtime backends via ort |
candle |
No | Pure-Rust backends (no C++ runtime) |
metal |
No | Metal GPU acceleration (enables candle) |
cuda |
No | CUDA GPU acceleration (enables candle) |
analysis |
No | Coref metrics, cluster encoders |
schema |
No | JSON Schema for output types |
llm |
No | LLM-based extraction (OpenRouter, Anthropic, Groq, Gemini, Ollama) |
production |
No | parking_lot locks + tracing instrumentation |
bundled-crf-weights |
No | Embed trained CRF weights in binary |
bundled-hmm-params |
No | Embed HMM parameters in binary |
CLI
# PER:1 "Lynn Conway"
# ORG:2 "IBM" "Xerox PARC"
# LOC:1 "California"
# drug:1 "Aspirin" symptom:2 "headaches" "fever"
# Coreference: "Sophie Wilson" -> "She"
JSON output with --format json. Batch processing with anno batch. Graph export (N-Triples, JSON-LD, CSV) with anno export --features graph.
Coreference
| Backend | Type | Quality | Speed |
|---|---|---|---|
SimpleCorefResolver |
Rule-based (9 sieves) | Low | Fast |
FCoref |
Neural (DistilRoBERTa) | 78.5 F1 [3] | Medium |
MentionRankingCoref |
Mention-ranking | Medium | Medium |
FCoref requires a one-time model export: uv run scripts/export_fcoref.py (from a repo clone).
RAG preprocessing (rag::resolve_for_rag(), rag::preprocess()): rewrites pronouns for self-contained chunks after splitting. Always available (no feature flag required).
Scope
Inference-time extraction. Training pipelines are out of scope -- use upstream frameworks and export ONNX weights.
Troubleshooting
- ONNX linking errors: use
default-features = falsefor builds without C++, or checkORT_DYLIB_PATH. - Model downloads: set
HF_HUB_OFFLINE=1for cached-only mode behind firewalls. - Feature errors: most backends are gated behind
onnxorcandle. - Offset mismatches: all spans use character offsets, not byte offsets. See CONTRACT.md.
Documentation
- QUICKSTART -- getting started
- CONTRACT -- offset semantics, scope
- BACKENDS -- backend details, feature flags
- ARCHITECTURE -- crate layout
- REFERENCES -- full bibliography
- API docs
References
[1] Grishman & Sundheim, COLING 1996. [2] Tjong Kim Sang & De Meulder, CoNLL 2003. [3] Otmazgin et al., AACL 2022 (F-COREF). [4] Jurafsky & Martin, SLP3 2024. [5] Zaratiana et al., NAACL 2024 (GLiNER). [6] Bogdanov et al., 2024 (NuNER). [7] Li et al., AAAI 2022 (W2NER). [8] Devlin et al., NAACL 2019 (BERT). [9] Lafferty et al., ICML 2001 (CRF). [10] Wang et al., COLING 2020 (TPLinker). [11] Zaratiana et al., 2025 (GLiNER2). [12] Rabiner, Proc. IEEE 1989 (HMM).
Full list: docs/REFERENCES.md. Citeable via CITATION.cff.
License
Dual-licensed under MIT or Apache-2.0.