pub fn estimate_tokens(text: &str) -> usizeExpand description
Estimate token count for a text using tiktoken.
This uses the cl100k_base encoding which is used by:
- GPT-4
- GPT-3.5-turbo
- GPT-4o
- GPT-4o-mini
- text-embedding-ada-002
- text-embedding-3-small/large
ยงExample
use vectorless::domain::estimate_tokens;
assert_eq!(estimate_tokens(""), 0);
assert!(estimate_tokens("hello world") > 0);