llm_utils
Utilities for Llama.cpp, Openai, Anthropic, Mistral-rs. Made for the llm_client crate, but split into it's own crate because some of these are useful!
Installation
[]
= "*"
Model loading 🛤️
-
Presets for popular GGUF models, along with pre-populated models for Openai and Anthropic.
-
Load GGUF models from Hugging Face with auto-quantization level picking based on vram.
// Download the largest quantized Mistral-7B-Instruct model that will fit in your vram
//
let model: GGUFModel = default
.mistral_7b_instruct
.vram
.ctx_size // ctx_size impacts vram usage!
.load
.await?;
// Or just load directly from a url
//
let model: GGUFModel = new
.from_quant_file_url
.load
.await?;
not_a_real_assert_eq!
not_a_real_assert_eq!
// Or Openai
//
let model: OpenAiModel = gpt_4_o;
let model: OpenAiModel = openai_backend_from_model_id;
not_a_real_assert_eq!
Tokenizer 🧮
-
Hugging Face's Tokenizer library for local models and Tiktoken-rs for Openai.
-
Simple abstract API for encoding and decoding allows for abstract LLM consumption across multiple architechtures.
-
Safely set the
max_token
param for LLMs to ensure requests don't fail due to exceeding token limits!
// Get a tokenizer
//
let tokenizer: LlmUtilsTokenizer = new_tiktoken;
// Or from hugging face...
// need to add support for a tokenizer from GGUF, but this does not exist yet.
// So the tokenizer does not work unless you can first load the tokenizer.json from the original repo
// as the GGUF format doesn't not include it.
//
let tokenizer: LlmUtilsTokenizer = new_from_model;
let token_ids: = tokenizer.tokenize;
// This function is used for generating logit bias
let token_id: u32 = tokenizer.try_into_single_token;
Prompting 🎶
-
Generate properly formatted prompts for GGUF models, Openai, and Anthropic.
-
Uses the GGUF's chat template and Jinja templates to format the prompt to model spec.
-
Create prompts from a combination of dynamic inputs and/or static inputs from file.
// Default formatted prompt (Openai and Anthropic format)
//
let default_formatted_prompt: = default_formatted_prompt?;
// Get total tokens in prompt
//
let total_prompt_tokens: u32 = model.openai_token_count_of_prompt;
// Then convert it to be used for a GGUF model
//
let gguf_formatted_prompt: String = convert_default_prompt_to_model_format?;
// Since the GGUF formatted prompt is just a string, we can just use the generic count_tokens function
//
let total_prompt_tokens: u32 = tokenizer.count_tokens;
// Validate requested max_tokens for a generation. If it exceeds the models limits, reduce max_tokens to a safe value.
//
let safe_max_tokens = get_and_check_max_tokens_for_response?;
Grammar 🤓
-
Grammars are the most capable method for structuring the output of an LLM. This was designed for use with LlamaCpp, but plan to support others.
-
Create lists of N items, restrict character types.
-
More to be added (JSON, classification, restrict characters, words, phrases)
// Return a list of between 1, 4 items
//
let grammar = create_list_grammar;
// List will be formatted: `- <list text>\n
//
let response: String = text_generation_request.await?;
// So you can easily split like:
//
let response_items: = response
.lines
.map
.collect;
// Exclude numbers from text generation
//
let grammar = create_text_structured_grammar;
let response: String = text_generation_request.await?;
assert!
assert!
// Exclude a list of common, and commonly unwanted characters from text generation
//
let grammar = create_text_structured_grammar;
let response: String = text_generation_request.await?;
assert!
assert!
assert!
Logit bias #️⃣
-
Create properly formatted logit bias requests for LlamaCpp and Openai.
-
Functionality to add logit bias from a variety of sources, along with validation.
// Exclude some tokens from text generation
//
let mut words = new;
words.entry.or_insert;
words.entry.or_insert;
// Build and validate
//
let logit_bias = logit_bias_from_words
let validated_logit_bias = validate_logit_bias_values?;
// Convert
//
let openai_logit_bias = convert_logit_bias_to_openai_format?;
let llama_logit_bias = convert_logit_bias_to_llama_format?;
Text utils 📝
-
Generic utils for cleaning text. Mostly useful for RAG.
-
Will add text splitting in the future.
License
This project is licensed under the MIT License.
Contributing
My motivation for publishing is for someone to point out if I'm doing something wrong!