Expand description
§Tokenization Module
This module provides accurate token counting using OpenAI’s tiktoken tokenizer, replacing the simple character-based estimation used previously.
§Features
- Accurate Token Counting: Uses tiktoken cl100k_base encoding (GPT-4 compatible)
- Multiple Encoding Support: Supports different OpenAI encodings
- Content-Aware Estimation: Handles code content more accurately than character counting
- Budget Management: Token budget allocation and tracking
§Usage
use scribe_core::tokenization::{TokenCounter, TokenizerConfig};
let config = TokenizerConfig::default();
let counter = TokenCounter::new(config)?;
let content = "fn main() { println!(\"Hello, world!\"); }";
let token_count = counter.count_tokens(content)?;
println!("Token count: {}", token_count);
Modules§
- utils
- Utilities for working with tokens and content
Structs§
- Token
Budget - Token budget tracker for selection algorithms
- Token
Counter - Main tokenizer interface for accurate token counting
- Tokenization
Comparison - Comparison between tiktoken and legacy tokenization
- Tokenizer
Config - Configuration for the tokenizer
Enums§
- Content
Type - Content type for budget recommendations