kitoken 0.10.1

Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization
Documentation

kitoken

There is very little structured metadata to build this page from currently. You should check the main library docs, readme, or Cargo.toml in case the author documented the features in them.

This version has 19 feature flags, 13 of them enabled by default.

default

convert (default)

multiversion (default)

normalization (default)

regex-perf (default)

serialization (default)

std (default)

convert-detect (default)

convert-sentencepiece (default)

convert-tekken (default)

convert-tiktoken (default)

convert-tokenizers (default)

normalization-charsmap (default)

normalization-unicode (default)

all

regex-onig

regex-unicode

split

split-unicode-script

unstable

This feature flag does not enable additional features.