Rust Keyword Extraction
Introduction
This is a simple NLP library with a list of algorithms related to keyword extraction:
- Tokenizer for tokenizing text;
- TF-IDF for calculating the importance of a word in one or more documents;
- Co-occurrence for calculating relationships between words within a specific window size;
- RAKE for extracting key phrases from a document;
- TextRank for extracting keywords and key phrases from a document;
Algorithms
The full list of the algorithms in this library:
- Helper algorithms:
- Tokenizer
- Co-occurrence
- Keyword extraction algorithms:
- TF-IDF
- RAKE
- TextRank
Upcoming algorithms:
- YAKE
Usage
Add the library to your Cargo.toml:
[]
= "1.2.0"
Or use cargo add:
Features
It is possible to enable or disable features:
"tf_idf": TF-IDF algorithm;"rake": RAKE algorithm;"text_rank": TextRank algorithm;"all": all algorithms;"parallel": parallelization of the algorithms with Rayon;
Default features: "tf_idf".
NOTE: "parallel" feature is only recommended for large documents, it exchanges memory for computation resourses.
License
This project is licensed under the GNU Lesser General Public License v3.0. See the Copying and Copying Lesser files for details.