Rust Keyword Extraction
Introduction
This is a simple NLP library with a list of unsupervised keyword extraction algorithms:
- Tokenizer for tokenizing text;
- TF-IDF for calculating the importance of a word in one or more documents;
- Co-occurrence for calculating relationships between words within a specific window size;
- RAKE for extracting key phrases from a document;
- TextRank for extracting keywords and key phrases from a document;
- YAKE for extracting keywords with a n-gram size (defaults to 3) from a document.
Algorithms
The full list of the algorithms in this library:
- Helper algorithms:
- Tokenizer
- Co-occurrence
- Keyword extraction algorithms:
- TF-IDF
- RAKE
- TextRank
- YAKE
Usage
Add the library to your Cargo.toml:
[]
= "1.4.3"
Or use cargo add:
Features
It is possible to enable or disable features:
"tf_idf": TF-IDF algorithm;"rake": RAKE algorithm;"text_rank": TextRank algorithm;"yake": YAKE algorithm;"all": algorimths and helpers;"parallel": parallelization of the algorithms with Rayon;"co_occurrence": Co-occurrence algorithm;
Default features: ["tf_idf", "rake", "text_rank"]. By default all algorithms apart from "co_occurrence" and "yake" are enabled.
NOTE: "parallel" feature is only recommended for large documents, it exchanges memory for computation resourses.
Examples
For the stop words, you can use the stop-words crate:
[]
= "0.8.0"
For example for english:
use ;
TF-IDF
Create a TfIdfParams enum which can be one of the following:
- Unprocessed Documents:
TfIdfParams::UnprocessedDocuments; - Processed Documents:
TfIdfParams::ProcessedDocuments; - Single Unprocessed Document/Text block:
TfIdfParams::TextBlock;
use ;
RAKE
Create a RakeParams enum which can be one of the following:
- With defaults:
RakeParams::WithDefaults; - With defaults and phrase length (phrase window size limit):
RakeParams::WithDefaultsAndPhraseLength; - All:
RakeParams::All;
use ;
TextRank
Create a TextRankParams enum which can be one of the following:
- With defaults:
TextRankParams::WithDefaults; - With defaults and phrase length (phrase window size limit):
TextRankParams::WithDefaultsAndPhraseLength; - All:
TextRankParams::All;
use ;
YAKE
NOTE: YAKE is a more complex algorithm and doesn't support the parallel feature yet.
Create a YakeParams enum which can be one of the following:
- With defaults:
YakeParams::WithDefaults; - All:
YakeParams::All;
use ;
Contributing
I would love your input! I want to make contributing to this project as easy and transparent as possible, please read the CONTRIBUTING.md file for details.
License
This project is licensed under the GNU Lesser General Public License v3.0. See the Copying and Copying Lesser files for details.