Rust Stemmers
This crate implements some stemmer algorithms found in the snowball project which are compiled to rust using a rust-backend for the snowball language.
Supported Algorithms
- Arabic
- English
- French
- German
- Italian
- Portuguese
- Romanian
- Russian
- Spanish
Usage
extern crate native_stemmers;
use ;
// Create a stemmer for the english language
let en_stemmer = create;
// Stemm the word "fruitlessly"
// Please be aware that all algorithms expect their input to only contain lowercase characters.
assert_eq!;
Issues
Generated code is neither beautiful nor idiomatic nor optimized and full of warnings.
There are some very low hanging fruit to fix this. Contributions to the rust backend or this crate are very welcome.
Adding a stemmer
It is very simple to add a snowball-stemmer to this library:
- Install snowball with rust-backend support.
- Put the .sbl file containing the snowball-code in the algorithms directory
- Add
pub mod <language>;
tosrc/snowball/algorithms/mod.rs
- Add an enum-variant to the
Algorithm
-enum - In
Stemmer::create
add a path for your enum-variant - If test-data exists please consider implementing a test case in the tests-module
- Run the
recompile_and_test.sh
-script which expects a valid snowball-compiler installation in your path - Send a pull-request
Related Projects
- rust-backend for the snowball language which generated the code in src/snowball/algorithms.
- The stemmer crate provides bindings to the C Snowball implementation.