Module ner

Expand description

§Named Entity Recognition pipeline

Extracts entities (Person, Location, Organization, Miscellaneous) from text. Pretrained models are available for the following languages:

English
German
Spanish
Dutch

The default NER mode is an English BERT cased large model finetuned on CoNNL03, contributed by the MDZ Digital Library team at the Bavarian State Library All resources for this model can be downloaded using the Python utility script included in this repository.

Set-up a Python virtual environment and install dependencies (in ./requirements.txt)
Run the conversion script python /utils/download-dependencies_bert_ner.py. The dependencies will be downloaded to the user’s home directory, under ~/rustbert/bert-ner

The example below illustrate how to run the model for the default English NER model

use rust_bert::pipelines::ner::NERModel;
let ner_model = NERModel::new(Default::default())?;

let input = [
    "My name is Amy. I live in Paris.",
    "Paris is a city in France.",
];
let output = ner_model.predict(&input);

Output: \

[
    [
        Entity {
            word: String::from("Amy"),
            score: 0.9986,
            label: String::from("I-PER"),
            offset: Offset { begin: 11, end: 14 },
        },
        Entity {
            word: String::from("Paris"),
            score: 0.9985,
            label: String::from("I-LOC"),
            offset: Offset { begin: 26, end: 31 },
        },
    ],
    [
        Entity {
            word: String::from("Paris"),
            score: 0.9988,
            label: String::from("I-LOC"),
            offset: Offset { begin: 0, end: 5 },
        },
        Entity {
            word: String::from("France"),
            score: 0.9993,
            label: String::from("I-LOC"),
            offset: Offset { begin: 19, end: 25 },
        },
    ],
]

To run the pipeline for another language, change the NERModel configuration from its default:

use rust_bert::pipelines::common::ModelType;
use rust_bert::pipelines::ner::NERModel;
use rust_bert::pipelines::token_classification::TokenClassificationConfig;
use rust_bert::resources::RemoteResource;
use rust_bert::roberta::{
    RobertaConfigResources, RobertaModelResources, RobertaVocabResources,
};
use tch::Device;

use rust_bert::pipelines::common::ModelResource;
let ner_config = TokenClassificationConfig {
    model_type: ModelType::XLMRoberta,
    model_resource: ModelResource::Torch(Box::new(RemoteResource::from_pretrained(
        RobertaModelResources::XLM_ROBERTA_NER_DE,
    ))),
    config_resource: Box::new(RemoteResource::from_pretrained(
        RobertaConfigResources::XLM_ROBERTA_NER_DE,
    )),
    vocab_resource: Box::new(RemoteResource::from_pretrained(
        RobertaVocabResources::XLM_ROBERTA_NER_DE,
    )),
    lower_case: false,
    device: Device::cuda_if_available(),
    ..Default::default()
};

let ner_model = NERModel::new(ner_config)?;

//    Define input
let input = [
    "Mein Name ist Amélie. Ich lebe in Paris.",
    "Paris ist eine Stadt in Frankreich.",
];
let output = ner_model.predict(&input);

The XLMRoberta models for the languages are defined as follows:

Language	Model name
English	XLM_ROBERTA_NER_EN
German	XLM_ROBERTA_NER_DE
Spanish	XLM_ROBERTA_NER_ES
Dutch	XLM_ROBERTA_NER_NL

Structs§

Entity: Entity generated by a NERModel
NERModel: NERModel to extract named entities

Module nerCopy item path

§Named Entity Recognition pipeline

Structs§

Module ner