Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
anno
Text annotation and entity extraction. Covers NER, coreference resolution, PII detection, relation extraction, and export to standard formats.
Multiple backends (ML, statistical, rule-based) are tried at runtime; works without model downloads via built-in fallbacks.
Dual-licensed under MIT or Apache-2.0. MSRV: 1.88.
Quickstart
[]
= "0.10"
let entities = extract?;
for e in &entities
// Offline (heuristic only):
// Sophie Wilson [PER] (0,13) 0.60
// ARM [ORG] (27,30) 0.55
// With `onnx` enabled and default models cached, the ML backends raise
// confidences and add entities. `ANNO_NO_DOWNLOADS=1` blocks new
// HuggingFace fetches but still loads cached or locally-exported models.
# Ok::
Filter results with prelude (re-exports common types including Result):
use *;
# let entities = extract?;
let people: = entities.of_type.collect;
let confident: = entities.above_confidence.collect;
# Ok::
For backend control, construct a model directly:
use ;
let m = default;
let ents = m.extract_entities?;
# Ok::
StackedNER::default() selects the best available backend at runtime: BERT ONNX and NuNER (both tried independently when onnx enabled and models cached), then GLiNER if neither loaded, falling back to pattern + heuristic extraction. Set ANNO_NO_DOWNLOADS=1 to disable new HuggingFace downloads; cached models and any backend loaded from a local path (from_local / ONNX export scripts) continue to work.
Zero-shot custom types via GLiNER:
use GLiNEROnnx;
let m = new?;
let ents = m.extract?;
for e in &ents
// drug: Aspirin
// symptom: headaches
# Ok::
Custom backends
AnyModel wraps a closure into a Model, bypassing the sealed trait when you need to plug in an external NER system:
use ;
let model = new;
let ents = model.extract_entities?;
# Ok::
PII detection
Classify NER entities as PII and scan for structured patterns (SSN, credit card, IBAN, email, phone). Redact or pseudonymize in one call:
use ;
let text = "John Smith's SSN is 123-45-6789.";
let m = default;
let redacted = scan_and_redact?;
// "[PERSON_1]'s SSN is [ID_NUMBER_1]."
# Ok::
Backends
Multiple backends spanning ML (GLiNER, NuNER, BERT, W2NER, TPLinker, GLiRel), statistical (CRF, HMM), rule-based (pattern, heuristic), and LLM-based extraction. ML backends are feature-gated (onnx or candle); weights download from HuggingFace on first use. See BACKENDS.md for the full list, default models, and status.
Feature flags
onnx (default): ONNX Runtime backends. candle: pure-Rust backends, no C++ runtime. metal / cuda: GPU acceleration (enables candle). llm: LLM-based extraction via OpenRouter, Anthropic, Groq, Gemini, or Ollama. discourse: centering theory, abstract anaphora, dialogue acts. analysis: coref metrics and cluster encoders. schema: JSON Schema for output types. production: tracing instrumentation.
CLI
# PER:1 "Lynn Conway"
# ORG:2 "IBM" "Xerox PARC"
# LOC:1 "California"
# drug:1 "Aspirin" symptom:2 "headaches" "fever"
# Coreference: "Sophie Wilson" -> "She"
JSON output with --format json. Batch processing with anno batch. Graph export (N-Triples, JSON-LD, CSV) with anno export --features graph.
Coreference
Three resolvers: SimpleCorefResolver (rule-based, 9 sieves; requires analysis feature), FCoref (neural, 78.5 F1 on CoNLL-2012 [3]; requires onnx), and MentionRankingCoref. FCoref requires a one-time model export: uv run scripts/export_fcoref.py (from a repo clone).
RAG preprocessing (rag::resolve_for_rag(), rag::preprocess()): rewrites pronouns for self-contained chunks after splitting. Always available (no feature flag required).
Scope
Inference-time extraction. Training pipelines are out of scope. Use upstream frameworks and export ONNX weights.
Troubleshooting
- ONNX linking errors: use
default-features = falsefor builds without C++, or checkORT_DYLIB_PATH. - Model downloads: set
ANNO_NO_DOWNLOADS=1for cached-only mode behind firewalls. - Feature errors: most backends are gated behind
onnxorcandle. - Offset mismatches: all spans use character offsets, not byte offsets. See CONTRACT.md.
Examples
All examples live in crates/anno/examples/. Run with cargo run --example <name>.
| Example | Feature | What it shows |
|---|---|---|
quickstart |
-- | One-line extraction, filtering with EntitySliceExt |
pii_redact |
-- | Detect names, SSNs, emails; redact or pseudonymize |
zero_shot |
onnx |
Custom entity types ("drug", "symptom") via GLiNER |
relations |
-- | Entity-pair relation extraction with TPLinker |
gliner_multitask |
onnx |
Multi-task extraction (NER + classification) via TaskSchema |
coref |
analysis |
Coreference chains linking "Marie Curie" and "Curie" |
export_formats |
-- | brat standoff, CoNLL BIO, JSONL, graph CSV |
rag_preprocess |
-- | Chunking + pronoun rewriting for self-contained RAG chunks |
batch |
-- | Parallel extraction over multiple documents |
References
[1] Grishman & Sundheim, COLING 1996. [2] Tjong Kim Sang & De Meulder, CoNLL 2003. [3] Otmazgin et al., AACL 2022 (F-COREF). [4] Jurafsky & Martin, SLP3 2024. [5] Zaratiana et al., NAACL 2024 (GLiNER). [6] Bogdanov et al., 2024 (NuNER). [7] Li et al., AAAI 2022 (W2NER). [8] Devlin et al., NAACL 2019 (BERT). [9] Lafferty et al., ICML 2001 (CRF). [10] Wang et al., COLING 2020 (TPLinker). [11] Stepanov & Shtopko, 2024 (GLiNER multi-task). [12] Rabiner, Proc. IEEE 1989 (HMM).
Full list: docs/REFERENCES.md. Citeable via CITATION.cff.
License
Dual-licensed under MIT or Apache-2.0.