sapient-tokenizers 0.3.0

HuggingFace-compatible tokenizers for SAPIENT — BPE, WordPiece, SentencePiece, chat templates
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
//! `sapient-tokenizers` — HuggingFace-compatible tokenization.
//!
//! Wraps the official HuggingFace `tokenizers` Rust crate, which supports:
//! - BPE (GPT-2, Llama, Falcon, Phi, Qwen)
//! - WordPiece (BERT, RoBERTa, DistilBERT)
//! - SentencePiece (T5, Gemma, Llama)
//!
//! Also provides Jinja2 chat template rendering for chat models.

pub mod chat;
pub mod tokenizer;

pub use chat::{ChatMessage, ChatRole, ChatTemplate};
pub use tokenizer::{SapientTokenizer, TokenizerOptions};