llm_prompt: Low Level Prompt System for API LLMs and local LLMs
The llm_prompt crate is a workspace member of the llm_client project.
Features
Local LLM Support
- Uses the LLM's chat template to properly format the prompt
- Wider support than Llama.cpp with Jinja chat templates and raw tokens:
- Llama.cpp attempts to build a prompt from string input using community implementations of chat templates and matching via model ids. It does not always work nor does it support all models
- Llama.cpp performs no manipulation of the input when sending just tokens, so using
this crate's
get_built_prompt_as_tokens
function is safer
- Build with generation prefixes that support all local models, even those that don't explicitly support it
API LLM Support
- OpenAI formatted prompts (OpenAI, Anthropic, etc.)
- Outputs System/User/Assistant keys and content strings
Accurate Token Counts
- Accurately count prompt tokens to ensure prompts stay within model limits
- Handles unique rules for both API and Local models
User Friendly
- A single struct with thread safe interior mutability for ergonomics
- Fails gracefully via Result if the prompt does not match turn ordering rules
Serialization Support
- Serde implemented for PromptMessages enabling save/load from file
Use
The llm_models crate from the llm_client project is used here for example purposes. It is not required.
use *;
use ApiLlmModel;
use LocalLlmModel;
// OpenAI Format
let model = gpt_3_5_turbo;
let prompt = new_api_prompt;
// Chat Template
let model = default;
let prompt = new_local_prompt;
// There are three types of 'messages'
// Add system messages
prompt.add_system_message?.set_content;
// User messages
prompt.add_user_message?.set_content;
// LLM responses
prompt.add_assistant_message?.set_content;
// Builds with generation prefix. The llm will complete the response: 'Don't you think that is... cool?'
// Only Chat Template format supports this
prompt.set_generation_prefix;
// Access (and build) the underlying prompt topography
let local_prompt: &LocalPrompt = prompt.local_prompt?;
let api_prompt: &ApiPrompt = prompt.api_prompt?;
// Get chat template formatted prompt
let local_prompt_as_string: String = prompt.local_prompt?.get_built_prompt?;
let local_prompt_as_tokens: = prompt.local_prompt?.get_built_prompt_as_tokens?;
// Openai formatted prompt (Openai and Anthropic format)
let api_prompt_as_messages: = prompt.api_prompt?.get_built_prompt?;
// Get total tokens in prompt
let total_prompt_tokens: u64 = prompt.local_prompt?.get_total_prompt_tokens;
let total_prompt_tokens: u64 = prompt.api_prompt?.get_total_prompt_tokens;
// Validate requested max_tokens for a generation. If it exceeds the models limits, reduce max_tokens to a safe value
let actual_request_tokens = check_and_get_max_tokens?;
LlmPrompt
requires a tokenizer. You can use the llm_models
crate's tokenizer, or implement the PromptTokenizer trait on your own tokenizer.