🪙 toktkn
toktkn is a BPE tokenizer implemented in rust and exposed in python using pyo3 bindings.
# create new tokenizer
=
=
# build encoding rules on some corpus
=
assert ==
# serialize to disk
del
=
assert
Install
Install toktkn from PyPI with the following
pip install toktkn
Note: if you want to build from source make sure cargo is installed!
Performance
slightly faster than openai & a lot quicker than 🤗!

Performance measured on 2.5MB from the wikitext test split using openai's tiktoken gpt2 tokenizer with tiktoken==0.6.0 and the implementation from 🤗 tokenizers at tokenizers==0.19.1