This repository holds the code for the TokenGeeX Rust crate and Python package. TokenGeeX is an efficient tokenizer for code based on [UnigramLM (Taku Kudo 2018)](https://arxiv.org/abs/1804.10959) and [TokenMonster](https://github.com/alasdairforsythe/tokenmonster).