Module nlprule::tokenizer [−][src]
A tokenizer to split raw text into tokens. Tokens are assigned lemmas and part-of-speech tags by lookup from a Tagger and chunks containing information about noun / verb and grammatical case by a statistical Chunker. Tokens are disambiguated (i. e. information from the initial assignment is changed) in a rule-based way by DisambiguationRules.
Modules
chunk | A Chunker ported from OpenNLP. |
multiword | Checks if the input text contains multi-token phrases from a finite list (might contain e. g. city names) and assigns lemmas and part-of-speech tags accordingly. |
tag | A dictionary-based tagger. |
Structs
IncompleteSentenceIter | An iterator over IncompleteSentences. Has the same properties as SentenceIter. |
SentenceIter | An iterator over Sentences. Has some key properties: |
Tokenizer | The complete Tokenizer doing tagging, chunking and disambiguation. |