rust-tokenizers
Rust-tokenizer is a drop-in replacement for the tokenization methods from the Transformers library
Set-up
Rust-tokenizer requires a rust nightly build in order to use the Python API. Building from source involes the following steps:
- Install Rust and use the nightly tool chain
- run
python setup.py install
in the repository. This will compile the Rust library and install the python API - Example use are available in the
/tests
folder, including benchmark and integration tests
The library is fully unit tested at the Rust level