Expand description
§llm_models: Load and download LLM models, metadata, and tokenizers
The llm_models crate is a workspace member of the llm_client project.
§Features
- GGUFs from local storage or Hugging Face
- Parses model metadata from GGUF file
- Includes limited support for tokenizer from GGUF file
- Also supports loading Metadata and Tokenizer from their respective files
- API models from OpenAI, Anthropic, and Perplexity
- Tokenizer abstraction for Hugging Face’s Tokenizer and Tiktoken
§LocalLlmModel
Everything you need for GGUF models. The GgufLoader
wraps the loaders for convenience.
All loaders return a LocalLlmModel
which contains the tokenizer, metadata, chat template,
and anything that can be extracted from the GGUF.
§GgufPresetLoader
- Presets for popular models like Llama 3, Phi, Mistral/Mixtral, and more
- Loads the best quantized model by calculating the largest quant that will fit in your VRAM
use llm_models::*;
let model: LocalLlmModel = GgufLoader::default()
.llama3_1_8b_instruct()
.preset_with_available_vram_gb(48) // Load the largest quant that will fit in your vram
.load().unwrap();
§GgufHfLoader
GGUF models from Hugging Face.
use llm_models::*;
let model: LocalLlmModel = GgufLoader::default()
.hf_quant_file_url("https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf")
.load().unwrap();
§GgufLocalLoader
GGUF models from local storage.
use llm_models::*;
let model: LocalLlmModel = GgufLoader::default()
.local_quant_file_path("/root/.cache/huggingface/hub/models--bartowski--Meta-Llama-3.1-8B-Instruct-GGUF/blobs/9da71c45c90a821809821244d4971e5e5dfad7eb091f0b8ff0546392393b6283")
.load().unwrap();
§ApiLlmModel
- Supports OpenAI, Anthropic, Perplexity, and adding your own API models
- Supports prompting, tokenization, and price estimation
use llm_models::*;
let model = ApiLlmModel::gpt_4_o();
assert_eq!(model.model_base.model_id, "gpt-4o");
assert_eq!(model.model_base.model_ctx_size, 128000);
assert_eq!(model.model_base.inference_ctx_size, 4096);
assert_eq!(model.cost_per_m_in_tokens, 5.00);
assert_eq!(model.cost_per_m_out_tokens, 15.00);
assert_eq!(model.tokens_per_message, 3);
assert_eq!(model.tokens_per_name, Some(1));
§LlmTokenizer
- Simple abstract API for encoding and decoding allows for abstract LLM consumption across multiple architectures
- Uses Hugging Face’s Tokenizer library for local models and Tiktoken-rs for OpenAI and Anthropic (Anthropic doesn’t have a publicly available tokenizer)
use llm_models::*;
// Get a Tiktoken tokenizer
let tok = LlmTokenizer::new_tiktoken("gpt-4o");
// From local path
let tok = LlmTokenizer::new_from_tokenizer_json("path/to/tokenizer.json");
// From repo (requires Hugging Face token)
// let tok = LlmTokenizer::new_from_hf_repo(hf_token, "meta-llama/Meta-Llama-3-8B-Instruct");
§Setter Traits
- All setter traits are public, so you can integrate into your own projects if you wish
- Examples include:
OpenAiModelTrait
,GgufLoaderTrait
,AnthropicModelTrait
, andHfTokenTrait
for loading models
Re-exports§
pub use api_model::anthropic::AnthropicModelTrait;
pub use api_model::openai::OpenAiModelTrait;
pub use api_model::perplexity::PerplexityModelTrait;
pub use api_model::ApiLlmModel;
pub use local_model::chat_template::LlmChatTemplate;
pub use local_model::gguf::loaders::preset::GgufPresetLoader;
pub use local_model::gguf::preset::GgufPresetTrait;
pub use local_model::gguf::GgufLoader;
pub use local_model::gguf::GgufLoaderTrait;
pub use local_model::hf_loader::HfTokenTrait;
pub use local_model::metadata::LocalLlmMetadata;
pub use local_model::LocalLlmModel;
pub use tokenizer::LlmTokenizer;