Crate quickner

source ·
Expand description

quickner is a library for NER annotation that prodives a CLI and a Python API. It comes with a default configuration file that can be modified to fit your needs.

§Batch Annotation

You can use quickner to annotate a batch of texts.

Provide a configuration file and a folder containing your texts:

  • a csv file containing the texts you want to annotate.
  • a csv file containing the entities you want to annotate.
  • a csv file containing the excludes you want to exclude from the annotation.

§Configuration

The configuration file is a toml file that contains the following fields:

[logging]
level = "info" # level of logging (debug, info, warning, error, fatal)

[texts]

[texts.input]
filter = false     # if true, only texts in the filter list will be used
path = "texts.csv" # path to the texts file

[texts.filters]
accept_special_characters = ".,-" # list of special characters to accept in the text (if special_characters is true)
alphanumeric = false              # if true, only strictly alphanumeric texts will be used
case_sensitive = false            # if true, case sensitive search will be used
max_length = 1024                 # maximum length of the text
min_length = 0                    # minimum length of the text
numbers = false                   # if true, texts with numbers will not be used
punctuation = false               # if true, texts with punctuation will not be used
special_characters = false        # if true, texts with special characters will not be used

[annotations]
format = "spacy" # format of the output file (jsonl, spaCy, brat, conll)

[annotations.output]
path = "annotations.jsonl" # path to the output file

[entities]

[entities.input]
filter = true         # if true, only entities in the filter list will be used
path = "entities.csv" # path to the entities file
save = true           # if true, the entities found will be saved in the output file

[entities.filters]
accept_special_characters = ".-" # list of special characters to accept in the entity (if special_characters is true)
alphanumeric = false             # if true, only strictly alphanumeric entities will be used
case_sensitive = false           # if true, case sensitive search will be used
max_length = 20                  # maximum length of the entity
min_length = 0                   # minimum length of the entity
numbers = false                  # if true, entities with numbers will not be used
punctuation = false              # if true, entities with punctuation will not be used
special_characters = true        # if true, entities with special characters will not be used

[entities.excludes]
# path = "excludes.csv" # path to entities to exclude from the search

§Example

use quickner::models::Quickner;

let quick = Quickner::new("./config.toml");
let annotations = quick.process(true);

§Single Annotation

You can also use quickner to annotate a single text. This is useful when you want to annotate a single text and then use the annotation in your code.

use quickner::Document;

let annotation = Document::from_string("Rust is maintained by Mozilla");
let entities = HashMap::new();
entities.insert("Rust", "Programming Language");
entities.insert("Mozilla", "Organization");
annotation.annotate(entities);

Structs§

  • A struct used to deserialize annotations from the configuration file.
  • A struct representing the configuration file.
  • An annotation is a text with a set of entities
  • A struct used to deserialize entities from the configuration file.
  • An entity is a text with a label
  • A struct used to deserialize excludes from the configuration file.
  • A struct used to deserialize filters from the configuration file.
  • A struct used to deserialize input from the configuration file.
  • A struct used to deserialize logging from the configuration file.
  • A struct used to deserialize output from the configuration file.
  • Quickner is the main struct of the application It holds the configuration file and the path to the configuration file
  • A struct used to deserialize annotations from the configuration file.

Enums§

  • A struct used to deserialize output format from the configuration file.

Functions§