use-token 0.1.0

Composable tokenization primitives for RustUse.
Documentation
  • Coverage
  • 100%
    21 out of 21 items documented1 out of 10 items with examples
  • Size
  • Source code size: 11.2 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 518.09 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 10s Average build duration of successful builds.
  • all releases: 10s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • RustUse/use-text
    1 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • CloudBranch

use-token

Composable tokenization primitives for RustUse.

use-token keeps tokenization explicit and small. It handles whitespace splitting, conservative word tokenization, lightweight sentence boundaries, and character spans without claiming to be a full NLP parser.

Included primitives

  • tokenize_whitespace
  • tokenize_words
  • tokenize_sentences
  • tokenize_chars
  • token_count

Example

use use_token::{token_count, tokenize_sentences, tokenize_words};

assert_eq!(token_count("Hello, world!"), 2);
assert_eq!(tokenize_words("don't stop").len(), 2);
assert_eq!(tokenize_sentences("One. Two!").len(), 2);