load_tokenizer

Function load_tokenizer 

Source
pub fn load_tokenizer(
    tokenizer_path: Option<&Path>,
    vocab_size: usize,
) -> Result<BoxedTokenizer>
Expand description

Load the best available tokenizer for Hydra.

Attempts to load in order:

  1. Llama 3 tokenizer from the specified path
  2. Fallback tokenizer with specified vocab size

§Arguments

  • tokenizer_path - Optional path to tokenizer.json
  • vocab_size - Fallback vocab size if no tokenizer found

§Example

let tokenizer = load_tokenizer(Some("./models/hydra/tokenizer.json"), 128000)?;