Expand description
§Named Entity Recognition pipeline
Extracts entities (Person, Location, Organization, Miscellaneous) from text. Pretrained models are available for the following languages:
- English
- German
- Spanish
- Dutch
The default NER mode is an English BERT cased large model finetuned on CoNNL03, contributed by the MDZ Digital Library team at the Bavarian State Library All resources for this model can be downloaded using the Python utility script included in this repository.
- Set-up a Python virtual environment and install dependencies (in ./requirements.txt)
- Run the conversion script python /utils/download-dependencies_bert_ner.py. The dependencies will be downloaded to the user’s home directory, under ~/rustbert/bert-ner
The example below illustrate how to run the model for the default English NER model
use rust_bert::pipelines::ner::NERModel;
let ner_model = NERModel::new(Default::default())?;
let input = [
"My name is Amy. I live in Paris.",
"Paris is a city in France.",
];
let output = ner_model.predict(&input);
Output: \
[
[
Entity {
word: String::from("Amy"),
score: 0.9986,
label: String::from("I-PER"),
offset: Offset { begin: 11, end: 14 },
},
Entity {
word: String::from("Paris"),
score: 0.9985,
label: String::from("I-LOC"),
offset: Offset { begin: 26, end: 31 },
},
],
[
Entity {
word: String::from("Paris"),
score: 0.9988,
label: String::from("I-LOC"),
offset: Offset { begin: 0, end: 5 },
},
Entity {
word: String::from("France"),
score: 0.9993,
label: String::from("I-LOC"),
offset: Offset { begin: 19, end: 25 },
},
],
]
To run the pipeline for another language, change the NERModel configuration from its default:
use rust_bert::pipelines::common::ModelType;
use rust_bert::pipelines::ner::NERModel;
use rust_bert::pipelines::token_classification::TokenClassificationConfig;
use rust_bert::resources::RemoteResource;
use rust_bert::roberta::{
RobertaConfigResources, RobertaModelResources, RobertaVocabResources,
};
use tch::Device;
use rust_bert::pipelines::common::ModelResource;
let ner_config = TokenClassificationConfig {
model_type: ModelType::XLMRoberta,
model_resource: ModelResource::Torch(Box::new(RemoteResource::from_pretrained(
RobertaModelResources::XLM_ROBERTA_NER_DE,
))),
config_resource: Box::new(RemoteResource::from_pretrained(
RobertaConfigResources::XLM_ROBERTA_NER_DE,
)),
vocab_resource: Box::new(RemoteResource::from_pretrained(
RobertaVocabResources::XLM_ROBERTA_NER_DE,
)),
lower_case: false,
device: Device::cuda_if_available(),
..Default::default()
};
let ner_model = NERModel::new(ner_config)?;
// Define input
let input = [
"Mein Name ist Amélie. Ich lebe in Paris.",
"Paris ist eine Stadt in Frankreich.",
];
let output = ner_model.predict(&input);
The XLMRoberta models for the languages are defined as follows:
Language | Model name |
---|---|
English | XLM_ROBERTA_NER_EN |
German | XLM_ROBERTA_NER_DE |
Spanish | XLM_ROBERTA_NER_ES |
Dutch | XLM_ROBERTA_NER_NL |