llm_utils: Tools for LLMs with minimal abstraction.
llm_utils
is not a 'framework'. There are no chains, agents, or buzzwords. Abstraction is minimized as much as possible and individual components are easily accessible. For real world examples of how this crate is used, check out the llm_client crate.
Cargo Install
="*"
LocalLlmModel
Everything you need for GGUF models. The GgugLoader
wraps the loaders for convience. All loaders return a LocalLlmModel
which contains the tokenizer, metadata, chat template, and anything that can be extract from the GGUF.
GgufPresetLoader
- Presets for popular models like Llama 3, Phi, Mistral/Mixtral, and more
- Loads the best quantized model by calculating the largest quant that will fit in your VRAM
let model: LocalLlmModel = default
.llama3_1_8b_instruct
.preset_with_available_vram_gb // Load the largest quant that will fit in your vram
.load?;
GgufHfLoader
GGUF models from Hugging Face.
let model: LocalLlmModel = default
.hf_quant_file_url
.load?;
GgufLocalLoader
GGUF models for local storage.
let model: LocalLlmModel = default
.local_quant_file_path
.load?;
ApiLlmModel
- Supports openai, anthropic, perplexity, and adding your own API models
- Supports prompting, tokenization, and price estimation
assert_eq!
LlmTokenizer
- Simple abstract API for encoding and decoding allows for abstract LLM consumption across multiple architechtures. *Hugging Face's Tokenizer library for local models and Tiktoken-rs for OpenAI and Anthropic (Anthropic doesn't have a publically available tokenizer.)
let tok = new_tiktoken; // Get a Tiktoken tokenizer
let tok = new_from_tokenizer_json; // From local path
let tok = new_from_hf_repo; // From repo
// From LocalLlmModel or ApiLlmModel
let tok = model.model_base.tokenizer;
LlmPrompt
- Generate properly formatted prompts for GGUF models, Openai, and Anthropic. Supports chat template strings/tokens and openai hashmaps
- Count prompt tokens and check to ensure it's within model limits
- Uses the GGUF's chat template and Jinja templates to format the prompt to model spec. Build with generation prefixes on all chat template models. Even those that don't explicitly support it.
// From LocalLlmModel or ApiLlmModel
let prompt: LlmPrompt = new_chat_template_prompt;
let prompt: LlmPrompt = new_openai_prompt;
// Add system messages
prompt.add_system_message.set_content;
// User messages
prompt.add_user_message.set_content;
// LLM responses
prompt.add_assistant_message.set_content;
// Messages all share the same functions see prompting::PromptMessage for more
prompt.add_system_message.append_content;
prompt.add_system_message.prepend_content;
// Builds with generation prefix. The llm will complete the response: 'Don't you think that is... cool?'.
prompt.set_generation_prefix;
// Get total tokens in prompt
let total_prompt_tokens: u32 = prompt.get_total_prompt_tokens;
// Get chat template formatted prompt
let chat_template_prompt: String = prompt.get_built_prompt_string;
let chat_template_prompt_as_tokens: = prompt.get_built_prompt_as_tokens
// Openai formatted prompt (Openai and Anthropic format)
let openai_prompt: = prompt.get_built_prompt_hashmap
// Validate requested max_tokens for a generation. If it exceeds the models limits, reduce max_tokens to a safe value.
let actual_request_tokens = check_and_get_max_tokens?;
Text Processing and NLP
TextChunker
Balanced text chunking means that all chunks are approximately the same size. See my blog post on text chunking for implementation details.
- A novel balanced text chunker that creates chunks of approximately equal length
- More accurate than unbalanced implementations that create orphaned final chunks
- Optimized with a parallelization
let text = "one, two, three, four, five, six, seven, eight, nine";
// Give a max token count of four, other text chunkers would split this into three chunks.
assert_eq!;
// A balanced text chunker, however, would also split the text into three chunks, but of even sizes.
assert_eq!;
As long as the the total token length of the incoming text is not evenly divisible by they max token count, the final chunk will be smaller than the others. In some cases it will be so small it will be "orphaned" and rendered useless. If you asked your RAG implementation What did seven eat?
, that final chunk that answers the question would not be retrievable.
The TextChunker first attempts to split semantically in the following order: Paragraphs, newlines, sentences. If that fails it builds chunks linearlly by using the largest available splits, and splitting where needed.
TextSplitter
- Unicode text segmentation on paragraphs, sentences, words, graphemes
- The only semantic sentence segementation implementation in Rust (Please ping me if i'm wrong!) - mostly works
let paragraph_splits: = new
.on_two_plus_newline
.split_text?;
let newline_splits: = new
.on_single_newline
.split_text?;
// There is no good implementation sentence splitting in Rust!
// This implementation is better than unicode-segmentation crate or any other crate I tested.
// But still not as good as a model based approach like Spacy or other NLP libraries.
//
let sentence_splits: = new
.on_sentences_rule_based
.split_text?;
// Unicode
let sentence_splits: = new
.on_sentences_unicode
.split_text?;
let word_splits: = new
.on_words_unicode
.split_text?;
let graphemes_splits: = new
.on_graphemes_unicode
.split_text?;
// If the split separator produces less than two splits,
// this mode tries the next separator.
// It does this until it produces more than one split.
//
let paragraph_splits: = new
.on_two_plus_newline
.recursive
.split_text?;
TextCleaner
- Clean raw text into unicode format
- Reduce duplicate whitespace
- Remove unwanted chars and graphemes
// Normalizes all whitespace chars .
// Reduce the number of newlines to singles or doubles (paragraphs) or convert them to " ".
// Optionally, remove all characters besides alphabetic, numbers, and punctuation.
//
let mut text_cleaner: String = new;
let cleaned_text: String = text_cleaner
.reduce_newlines_to_single_space
.remove_non_basic_ascii
.run;
// Convert HTML to cleaned text.
// Uses an implementation of Mozilla's readability mode and HTML2Text.
//
let cleaned_text: String = clean_html;
clean_html
- Clean raw HTML into clean strings of content
- Uses an implementation of Mozilla's Readability to remove unwanted HTML
test_text
- Macro generated test content
- Used for internal testing, but can be used for general LLM test cases
Grammar Constraints
- Pre-built configurable grammars for fine grained control of open source LLM outputs. Current implementations include booleans, integers, sentences, words, exact strings, and more. Open an issue if you'd like to suggest more
- Grammars are the most capable method for structuring the output of an LLM. This was designed for use with LlamaCpp, but plan to support others
// A grammar that constraints to a number between 1 and 4
let integer_grammar: IntegerGrammar = integer;
integer_grammar.lower_bound.upper_bound;
// Sets a stop word to be appended to the end of generation
integer_grammar.set_stop_word_done;
// Sets the primitive as optional; a stop word can be generated rather than the primitive
integer_grammar.set_stop_word_no_result;
// Returns the string to feed into LLM call
let grammar_string: String = integer_grammar.grammar_string;
// Cleans the response and checks if it's valid
let string_response: = integer_grammar.validate_clean;
// Parses the response to the grammar's primitive
let integer_response: = integer_grammar.grammar_parse;
// Enum for dynamic abstraction
let grammar: Grammar = integer_grammar.wrap;
// The enum implements the same functions that are generic across all grammars
grammar.set_stop_word_done;
let grammar_string: String = grammar.grammar_string;
let string_response: = grammar.validate_clean;
Setter Traits
- All setter traits are public, so you can integrate into your own projects if you wish.
- For example:
OpenAiModelTrait
,GgufLoaderTrait
,AnthropicModelTrait
, andHfTokenTrait
for loading models
Blog Posts
License
This project is licensed under the MIT License.
Contributing
My motivation for publishing is for someone to point out if I'm doing something wrong!