keyword_extraction 1.0.0

Collection of algorithms for keyword extraction from text
Documentation

Rust Keyword Extraction

Introduction

This is a simple NLP library with a list of algorithms related to keyword extraction:

  • Tokenizer for tokenizing text;
  • TF-IDF for calculating the importance of a word in one or more documents;
  • Co-occurrence for calculating relationships between words within a specific window size;
  • RAKE for extracting key phrases from a document;
  • TextRank for extracting keywords and key phrases from a document;

Features

The full list of intended features before publishing this library on crates.io is as follows:

  • Helper modules:
    • Tokenizer
    • Co-occurrence
  • Keyword extraction algorithms:
    • TF-IDF
    • RAKE
    • TextRank

Note: I removed yake as my implementation was very slow. I will try to implement it again in the future.

Usage

This library is not yet published on crates.io, so you will have to clone this repository and add it as a dependency in your Cargo.toml file.

License

This project is licensed under the GNU Lesser General Public License v3.0. See the Copying and Copying Lesses files for details.