Module ner

Source
Expand description

§Named Entity Recognition pipeline

Extracts entities (Person, Location, Organization, Miscellaneous) from text. Pretrained models are available for the following languages:

  • English
  • German
  • Spanish
  • Dutch

The default NER mode is an English BERT cased large model finetuned on CoNNL03, contributed by the MDZ Digital Library team at the Bavarian State Library All resources for this model can be downloaded using the Python utility script included in this repository.

  1. Set-up a Python virtual environment and install dependencies (in ./requirements.txt)
  2. Run the conversion script python /utils/download-dependencies_bert_ner.py. The dependencies will be downloaded to the user’s home directory, under ~/rustbert/bert-ner

The example below illustrate how to run the model for the default English NER model

use rust_bert::pipelines::ner::NERModel;
let ner_model = NERModel::new(Default::default())?;

let input = [
    "My name is Amy. I live in Paris.",
    "Paris is a city in France.",
];
let output = ner_model.predict(&input);

Output: \

[
    [
        Entity {
            word: String::from("Amy"),
            score: 0.9986,
            label: String::from("I-PER"),
            offset: Offset { begin: 11, end: 14 },
        },
        Entity {
            word: String::from("Paris"),
            score: 0.9985,
            label: String::from("I-LOC"),
            offset: Offset { begin: 26, end: 31 },
        },
    ],
    [
        Entity {
            word: String::from("Paris"),
            score: 0.9988,
            label: String::from("I-LOC"),
            offset: Offset { begin: 0, end: 5 },
        },
        Entity {
            word: String::from("France"),
            score: 0.9993,
            label: String::from("I-LOC"),
            offset: Offset { begin: 19, end: 25 },
        },
    ],
]

To run the pipeline for another language, change the NERModel configuration from its default:

use rust_bert::pipelines::common::ModelType;
use rust_bert::pipelines::ner::NERModel;
use rust_bert::pipelines::token_classification::TokenClassificationConfig;
use rust_bert::resources::RemoteResource;
use rust_bert::roberta::{
    RobertaConfigResources, RobertaModelResources, RobertaVocabResources,
};
use tch::Device;

use rust_bert::pipelines::common::ModelResource;
let ner_config = TokenClassificationConfig {
    model_type: ModelType::XLMRoberta,
    model_resource: ModelResource::Torch(Box::new(RemoteResource::from_pretrained(
        RobertaModelResources::XLM_ROBERTA_NER_DE,
    ))),
    config_resource: Box::new(RemoteResource::from_pretrained(
        RobertaConfigResources::XLM_ROBERTA_NER_DE,
    )),
    vocab_resource: Box::new(RemoteResource::from_pretrained(
        RobertaVocabResources::XLM_ROBERTA_NER_DE,
    )),
    lower_case: false,
    device: Device::cuda_if_available(),
    ..Default::default()
};

let ner_model = NERModel::new(ner_config)?;

//    Define input
let input = [
    "Mein Name ist Amélie. Ich lebe in Paris.",
    "Paris ist eine Stadt in Frankreich.",
];
let output = ner_model.predict(&input);

The XLMRoberta models for the languages are defined as follows:

LanguageModel name
EnglishXLM_ROBERTA_NER_EN
GermanXLM_ROBERTA_NER_DE
SpanishXLM_ROBERTA_NER_ES
DutchXLM_ROBERTA_NER_NL

Structs§

Entity
Entity generated by a NERModel
NERModel
NERModel to extract named entities