Crate nlprule[−][src]
Rule-based grammatical error correction through parsing LanguageTool rules.
Overview
NLPRule has the following core abstractions:
- A Tokenizer to split a text into tokens and analyze it by chunking, lemmatizing and part-of-speech tagging. Can also be used independently of the grammatical rules.
- A Rules structure containing a set of grammatical error correction rules.
Example: correct a text
use nlprule::{Tokenizer, Rules}; let tokenizer = Tokenizer::new("path/to/en_tokenizer.bin")?; let rules = Rules::new("path/to/en_rules.bin")?; assert_eq!( rules.correct("She was not been here since Monday.", &tokenizer), String::from("She was not here since Monday.") );
Example: get suggestions and correct a text
use nlprule::{Tokenizer, Rules, types::Suggestion, rules::apply_suggestions}; let tokenizer = Tokenizer::new("path/to/en_tokenizer.bin")?; let rules = Rules::new("path/to/en_rules.bin")?; let text = "She was not been here since Monday."; let suggestions = rules.suggest(text, &tokenizer); assert_eq!( suggestions, vec![Suggestion { start: 4, // these are character indices! end: 16, replacements: vec!["was not".into(), "has not been".into()], source: "WAS_BEEN.1".into(), message: "Did you mean was not or has not been?".into() }] ); let corrected = apply_suggestions(text, &suggestions); assert_eq!(corrected, "She was not here since Monday.");
Binaries are distributed with Github releases.
The 't lifetime
By convention the lifetime 't
in this crate is the lifetime of the input text.
Almost all structures with a lifetime are bound to this lifetime.
Re-exports
pub use rules::Rules; |
pub use tokenizer::Tokenizer; |
Modules
rule | Implementations related to single rules. |
rules | Sets of grammatical error correction rules. |
tokenizer | A tokenizer to split raw text into tokens. Tokens are assigned lemmas and part-of-speech tags by lookup from a Tagger and chunks containing information about noun / verb and grammatical case by a statistical Chunker. Tokens are disambiguated (i. e. information from the initial assignment is changed) in a rule-based way by DisambiguationRules. |
types | Fundamental types used by this crate. |
Macros
rules_filename | Gets the canonical filename for the rules binary for a language code in ISO 639-1 (two-letter) format. |
tokenizer_filename | Gets the canonical filename for the tokenizer binary for a language code in ISO 639-1 (two-letter) format. |
Enums
Error |
Functions
rules_filename | Gets the canonical filename for the rules binary for a language code in ISO 639-1 (two-letter) format. |
tokenizer_filename | Gets the canonical filename for the tokenizer binary for a language code in ISO 639-1 (two-letter) format. |