Expand description
§llm_prompt: Low Level Prompt System for API LLMs and local LLMs
The llm_prompt crate is a workspace member of the llm_client project.
§Features
§Local LLM Support
- Uses the LLM’s chat template to properly format the prompt
- Wider support than Llama.cpp with Jinja chat templates and raw tokens:
- Llama.cpp attempts to build a prompt from string input using community implementations of chat templates and matching via model ids. It does not always work nor does it support all models
- Llama.cpp performs no manipulation of the input when sending just tokens, so using
this crate’s
get_built_prompt_as_tokens
function is safer
- Build with generation prefixes that support all local models, even those that don’t explicitly support it
§API LLM Support
- OpenAI formatted prompts (OpenAI, Anthropic, etc.)
- Outputs System/User/Assistant keys and content strings
§Accurate Token Counts
- Accurately count prompt tokens to ensure prompts stay within model limits
- Handles unique rules for both API and Local models
§User Friendly
- A single struct with thread safe interior mutability for ergonomics
- Fails gracefully via Result if the prompt does not match turn ordering rules
§Serialization Support
- Serde implemented for PromptMessages enabling save/load from file
§Use
The llm_models crate from the llm_client project is used here for example purposes. It is not required.
use llm_prompt::*;
use llm_models::api_model::ApiLlmModel;
use llm_models::local_model::LocalLlmModel;
// OpenAI Format
let model = ApiLlmModel::gpt_3_5_turbo();
let prompt = LlmPrompt::new_api_prompt(
model.model_base.tokenizer.clone(),
Some(model.tokens_per_message),
model.tokens_per_name,
);
// Chat Template
let model = LocalLlmModel::default();
let prompt = LlmPrompt::new_local_prompt(
model.model_base.tokenizer.clone(),
&model.chat_template.chat_template,
model.chat_template.bos_token.as_deref(),
&model.chat_template.eos_token,
model.chat_template.unk_token.as_deref(),
model.chat_template.base_generation_prefix.as_deref(),
);
// There are three types of 'messages'
// Add system messages
prompt.add_system_message()?.set_content("You are a nice robot");
// User messages
prompt.add_user_message()?.set_content("Hello");
// LLM responses
prompt.add_assistant_message()?.set_content("Well, how do you do?");
// Builds with generation prefix. The llm will complete the response: 'Don't you think that is... cool?'
// Only Chat Template format supports this
prompt.set_generation_prefix("Don't you think that is...");
// Access (and build) the underlying prompt topography
let local_prompt: &LocalPrompt = prompt.local_prompt()?;
let api_prompt: &ApiPrompt = prompt.api_prompt()?;
// Get chat template formatted prompt
let local_prompt_as_string: String = prompt.local_prompt()?.get_built_prompt()?;
let local_prompt_as_tokens: Vec<u32> = prompt.local_prompt()?.get_built_prompt_as_tokens()?;
// Openai formatted prompt (Openai and Anthropic format)
let api_prompt_as_messages: Vec<HashMap<String, String>> = prompt.api_prompt()?.get_built_prompt()?;
// Get total tokens in prompt
let total_prompt_tokens: u64 = prompt.local_prompt()?.get_total_prompt_tokens();
let total_prompt_tokens: u64 = prompt.api_prompt()?.get_total_prompt_tokens();
// Validate requested max_tokens for a generation. If it exceeds the models limits, reduce max_tokens to a safe value
let actual_request_tokens = check_and_get_max_tokens(
model.context_length,
Some(model.max_tokens_output), // If using a GGUF model use either model.context_length or the ctx_size of the server
total_prompt_tokens,
Some(10), // Safety tokens
requested_max_tokens,
)?;
LlmPrompt
requires a tokenizer. You can use the llm_models
crate’s tokenizer, or implement the PromptTokenizer trait on your own tokenizer.
impl PromptTokenizer for LlmTokenizer {
fn tokenize(&self, input: &str) -> Vec<u32> {
self.tokenize(input)
}
fn count_tokens(&self, str: &str) -> u32 {
self.count_tokens(str)
}
}
Structs§
- ApiPrompt
- A prompt formatter for API-based language models that follow OpenAI’s message format.
- LlmPrompt
- A prompt management system that supports both API-based LLMs (like OpenAI) and local LLMs.
- Local
Prompt - A prompt formatter for local LLMs that use chat templates.
- MaxToken
State - Prompt
Message - An individual message within a prompt sequence.
- Prompt
Messages - A collection of prompt messages with thread-safe mutability.
Enums§
- Prompt
Message Type - Represents the type of message in a prompt sequence.
- Request
Token Limit Error - Text
Concatenator - Controls how text segments are joined together in prompt messages.
Traits§
- Prompt
Tokenizer - A trait for tokenizers that can be used with the prompt management system.
- Text
Concatenator Trait - Provides methods for managing text concatenation behavior.
Functions§
- apply_
chat_ template - Applies a chat template to a message, given a message and a chat template.
- check_
and_ get_ max_ tokens - Sets and validates the ‘max_tokens’ or ‘n_ctx’ or ‘n_predict’ parameter for a request. First, it checks that the total_prompt_tokens is less than the ctx_size - safety_tokens. Then returns ‘available_tokens’ as the lower of either: ctx_size - total_prompt_tokens - safety_tokens or if it’s provided, inference_ctx_size. If ‘requested_tokens’ is provided, ‘requested_tokens’ is returned if less than ‘available_tokens’. If ‘requested_tokens’ is ‘None’ or ‘requested_tokens’ is greater than ‘available_tokens’, ‘available_tokens’ is returned.