Expand description
Token encoding utilities
Constants§
- CHARS_
PER_ TOKEN_ CJK - CHARS_
PER_ TOKEN_ CODE - CHARS_
PER_ TOKEN_ EN - Common tokenization patterns
Functions§
- chars_
per_ token - Get appropriate chars per token ratio
- is_cjk
- Detect if text is primarily CJK
- is_code
- Detect if text is primarily code