Expand description
quickner is a library for NER annotation that prodives a CLI and a Python API. It comes with a default configuration file that can be modified to fit your needs.
§Batch Annotation
You can use quickner to annotate a batch of texts.
Provide a configuration file and a folder containing your texts:
- a csv file containing the texts you want to annotate.
- a csv file containing the entities you want to annotate.
- a csv file containing the excludes you want to exclude from the annotation.
§Configuration
The configuration file is a toml file that contains the following fields:
[logging]
level = "info" # level of logging (debug, info, warning, error, fatal)
[texts]
[texts.input]
filter = false # if true, only texts in the filter list will be used
path = "texts.csv" # path to the texts file
[texts.filters]
accept_special_characters = ".,-" # list of special characters to accept in the text (if special_characters is true)
alphanumeric = false # if true, only strictly alphanumeric texts will be used
case_sensitive = false # if true, case sensitive search will be used
max_length = 1024 # maximum length of the text
min_length = 0 # minimum length of the text
numbers = false # if true, texts with numbers will not be used
punctuation = false # if true, texts with punctuation will not be used
special_characters = false # if true, texts with special characters will not be used
[annotations]
format = "spacy" # format of the output file (jsonl, spaCy, brat, conll)
[annotations.output]
path = "annotations.jsonl" # path to the output file
[entities]
[entities.input]
filter = true # if true, only entities in the filter list will be used
path = "entities.csv" # path to the entities file
save = true # if true, the entities found will be saved in the output file
[entities.filters]
accept_special_characters = ".-" # list of special characters to accept in the entity (if special_characters is true)
alphanumeric = false # if true, only strictly alphanumeric entities will be used
case_sensitive = false # if true, case sensitive search will be used
max_length = 20 # maximum length of the entity
min_length = 0 # minimum length of the entity
numbers = false # if true, entities with numbers will not be used
punctuation = false # if true, entities with punctuation will not be used
special_characters = true # if true, entities with special characters will not be used
[entities.excludes]
# path = "excludes.csv" # path to entities to exclude from the search
§Example
use quickner::models::Quickner;
let quick = Quickner::new("./config.toml");
let annotations = quick.process(true);
§Single Annotation
You can also use quickner to annotate a single text. This is useful when you want to annotate a single text and then use the annotation in your code.
use quickner::Document;
let annotation = Document::from_string("Rust is maintained by Mozilla");
let entities = HashMap::new();
entities.insert("Rust", "Programming Language");
entities.insert("Mozilla", "Organization");
annotation.annotate(entities);
Structs§
- A struct used to deserialize annotations from the configuration file.
- A struct representing the configuration file.
- An annotation is a text with a set of entities
- A struct used to deserialize entities from the configuration file.
- An entity is a text with a label
- A struct used to deserialize excludes from the configuration file.
- A struct used to deserialize filters from the configuration file.
- A struct used to deserialize input from the configuration file.
- A struct used to deserialize logging from the configuration file.
- A struct used to deserialize output from the configuration file.
- Quickner is the main struct of the application It holds the configuration file and the path to the configuration file
- A struct used to deserialize annotations from the configuration file.
Enums§
- A struct used to deserialize output format from the configuration file.