fastok 0.0.1

BPE in Rust with bindings to Python using PyO3
Documentation
  • Coverage
  • 0%
    0 out of 1 items documented0 out of 0 items with examples
  • Size
  • Source code size: 16.85 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 353.54 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Links
  • Repository
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • alvarobartt

Development

maturin develop

Python bindings

>>> from fastok import PreTokenizer

>>> pre_tokenizer = PreTokenizer(model="gpt2")
>>> pre_tokenizer.pre_tokenize_str("My name is Alvaro and I live in Barcelona.")
['My', ' name', ' is', ' Alvaro', ' and', ' I', ' live', ' in', ' Barcelona', '.']
>>> pre_tokenizer.pre_tokenize(["My name is Alvaro and I live in Barcelona.", "I like pizza."])
[['My', ' name', ' is', ' Alvaro', ' and', ' I', ' live', ' in', ' Barcelona', '.'], ['I', ' like', ' pizza', '.']]