Skip to main content

Crate ferrum_tokenizer

Crate ferrum_tokenizer 

Source
Expand description

§Ferrum Tokenizer

MVP tokenizer implementation for Ferrum inference stack.

This crate provides HuggingFace tokenizers integration and implements the tokenizer interfaces defined in ferrum-interfaces.

§Features

  • HuggingFace Integration: Load tokenizers from HF Hub or local files
  • Incremental Decoding: Efficient token-by-token decoding for streaming
  • Chat Templates: Support for conversation formatting (basic)
  • Special Tokens: Proper handling of BOS, EOS, PAD tokens

Re-exports§

pub use implementations::*;

Modules§

implementations

Structs§

SpecialTokens
Special tokens configuration
TokenId
Token identifier used across the inference pipeline.
TokenizerInfo
Tokenizer information and metadata

Enums§

TokenizerType
Tokenizer types/algorithms

Traits§

IncrementalTokenizer
Incremental tokenizer state for streaming
Tokenizer
Core tokenizer trait for encoding/decoding operations
TokenizerFactory
Tokenizer factory for creating tokenizer instances

Functions§

default_factory
Default tokenizer factory using HuggingFace backend
load_from_file
Load tokenizer from file
load_from_hub
Load tokenizer from HuggingFace Hub

Type Aliases§

Result
Result type used throughout Ferrum