Rust-based Natural Language Toolkit (rsnltk)
A Rust library to support natural language processing with Python bindings
Features
The rsnltk
library integrates various existing Python-based NLP toolkits for powerful text analysis in Rust-based applications.
Current Features
This toolkit is based on the Python-based Stanza and other important libraries.
A list of functions from Stanza we bind here include:
- Tokenize
- Sentence Segmentation
- Multi-Word Token Expansion
- Part-of-Speech & Morphological Features
- Named Entity Recognition
- Sentiment Analysis
- Language Identification
Additionally, we can calculate the similarity between words based on WordNet though the semantic-kit
PyPI project via pip install semantic-kit
.
Installation
-
Make sure you install Python 3.6.6+ and PIP environment in your computer. Type
python -V
in the Terminal should print no error message; -
Install our PyPI package
ner-kit
(version>=0.0.5a1) for binding theStanza
package viapip install ner-kit==0.0.5a1
; -
Then, Rust environment is also installed in your computer. I use IntelliJ to develop Rust-based applications, where you can write Rust codes;
-
Create a simple Rust application project with a
main()
function. -
Add the
rsnltk
dependency to theCargo.toml
file, keep up the Latest version. -
After you add the
rsnltk
dependency in thetoml file
, install necessary language models from Stanza using the following Rust code for the first time you use this package.
Or you can manually install those language models via the Python-based ner-kit
package which provide more features in using Stanza. Go to: ner-kit
If no error occur in the above example, then it works. Finally, you can try the following advanced example usage.
Examples
Example 1: Part-of-speech Analysis
Example 2: Sentiment Analysis
Example 3: Named Entity Recognition
Example 4: Tokenize
Example 5: Tokenize Sentence
Example 6: Language Identification
Example 7: MWT expand
Example 8: Estimate the similarity between words in WordNet
Credits
Thank Stanford NLP Group for their hard work in Stanza.
License
MIT