tokengeex 1.0.1

TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster.
Documentation

TokenGeeX - Efficient Tokenizer for CodeGeeX

This repository holds the code for the TokenGeeX Rust crate and Python package. TokenGeeX is a tokenizer for CodeGeeX aimed at code and Chinese. It is based on UnigramLM (Taku Kudo 2018).