Skip to main content

Module encoding

Module encoding 

Source
Expand description

Token encoding utilities

Constants§

CHARS_PER_TOKEN_CJK
CHARS_PER_TOKEN_CODE
CHARS_PER_TOKEN_EN
Common tokenization patterns

Functions§

chars_per_token
Get appropriate chars per token ratio
is_cjk
Detect if text is primarily CJK
is_code
Detect if text is primarily code