tokenmonster 0.1.0

Greedy tiktoken-like tokenizer with embedded vocabulary (cl100k-base approximator)
Documentation
  • Coverage
  • 21.43%
    3 out of 14 items documented1 out of 12 items with examples
  • Size
  • Source code size: 1.27 MB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 1.77 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 21s Average build duration of successful builds.
  • all releases: 21s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • mboros1

tokenmonster

Greedy tiktoken-like tokenizer with an embedded vocabulary, intended for fast, allocation-light tokenization.

Features

  • Greedy tokenization compatible with common LLM vocabularies
  • Zero-copy where possible; minimal allocations
  • Optional tiny test vocabulary via the tiny_vocab feature

License: MIT