Skip to main content

Crate ferrum_tokenizer

Crate ferrum_tokenizer

Expand description

§Ferrum Tokenizer

MVP tokenizer implementation for Ferrum inference stack.

This crate provides HuggingFace tokenizers integration and implements the tokenizer interfaces defined in ferrum-interfaces.

§Features

HuggingFace Integration: Load tokenizers from HF Hub or local files
Incremental Decoding: Efficient token-by-token decoding for streaming
Chat Templates: Support for conversation formatting (basic)
Special Tokens: Proper handling of BOS, EOS, PAD tokens

Re-exports§

pub use implementations::*;

Modules§

implementations

Structs§

SpecialTokens: Special tokens configuration
TokenId: Token identifier used across the inference pipeline.
TokenizerInfo: Tokenizer information and metadata

Enums§

TokenizerType: Tokenizer types/algorithms

Traits§

IncrementalTokenizer: Incremental tokenizer state for streaming
Tokenizer: Core tokenizer trait for encoding/decoding operations
TokenizerFactory: Tokenizer factory for creating tokenizer instances

Functions§

default_factory: Default tokenizer factory using HuggingFace backend
load_from_file: Load tokenizer from file
load_from_hub: Load tokenizer from HuggingFace Hub

Type Aliases§

Result: Result type used throughout Ferrum