Expand description
Rule-based grammatical error correction through parsing LanguageTool rules.
§Overview
nlprule has the following core abstractions:
- A Tokenizer to split a text into tokens and analyze it by chunking, lemmatizing and part-of-speech tagging. Can also be used independently of the grammatical rules.
- A Rules structure containing a set of grammatical error correction rules.
§Examples
Correct a text:
use nlprule::{Tokenizer, Rules};
let tokenizer = Tokenizer::new("path/to/en_tokenizer.bin")?;
let rules = Rules::new("path/to/en_rules.bin")?;
assert_eq!(
rules.correct("She was not been here since Monday.", &tokenizer),
String::from("She was not here since Monday.")
);
Get suggestions and correct a text:
use nlprule::{Tokenizer, Rules, types::Suggestion, rules::apply_suggestions};
let tokenizer = Tokenizer::new("path/to/en_tokenizer.bin")?;
let rules = Rules::new("path/to/en_rules.bin")?;
let text = "She was not been here since Monday.";
let suggestions = rules.suggest(text, &tokenizer);
assert_eq!(*suggestions[0].span().char(), 4usize..16);
assert_eq!(suggestions[0].replacements(), vec!["was not", "has not been"]);
assert_eq!(suggestions[0].source(), "GRAMMAR/WAS_BEEN/1");
assert_eq!(suggestions[0].message(), "Did you mean was not or has not been?");
let corrected = apply_suggestions(text, &suggestions);
assert_eq!(corrected, "She was not here since Monday.");
Tokenize & analyze a text:
use nlprule::Tokenizer;
let tokenizer = Tokenizer::new("path/to/en_tokenizer.bin")?;
let text = "A brief example is shown.";
// returns an iterator over sentences
let sentence = tokenizer.pipe(text).next().expect("`text` contains one sentence.");
println!("{:#?}", sentence);
assert_eq!(sentence.tokens()[1].word().text().as_str(), "brief");
assert_eq!(sentence.tokens()[1].word().tags()[0].pos().as_str(), "JJ");
assert_eq!(sentence.tokens()[1].chunks(), vec!["I-NP-singular"]);
// some other information like char / byte span, lemmas etc. is also set!
Binaries are distributed with Github releases.
Re-exports§
Modules§
- Implementations related to single rules.
- Sets of grammatical error correction rules.
- A tokenizer to split raw text into tokens. Tokens are assigned lemmas and part-of-speech tags by lookup from a Tagger and chunks containing information about noun / verb and grammatical case by a statistical Chunker. Tokens are disambiguated (i. e. information from the initial assignment is changed) in a rule-based way by DisambiguationRules.
- Fundamental types used by this crate.
Macros§
- Gets the canonical filename for the rules binary for a language code in ISO 639-1 (two-letter) format.
- Gets the canonical filename for the tokenizer binary for a language code in ISO 639-1 (two-letter) format.
Enums§
Functions§
- Gets the canonical filename for the rules binary for a language code in ISO 639-1 (two-letter) format.
- Gets the canonical filename for the tokenizer binary for a language code in ISO 639-1 (two-letter) format.