TokenGeeX - Efficient Tokenizer for CodeGeeX
This repository holds the code for the TokenGeeX Rust crate and Python package. TokenGeeX is an efficient tokenizer for code based on UnigramLM (Taku Kudo 2018) and TokenMonster.
This repository holds the code for the TokenGeeX Rust crate and Python package. TokenGeeX is an efficient tokenizer for code based on UnigramLM (Taku Kudo 2018) and TokenMonster.