llm_prompt: Low Level Prompt System for API LLMs (OpenAI) and local LLMs (Chat Template)
This crate is part of the llm_client project.
Local LLM Support
- Uses the LLM's chat template to properly format the prompt.
- Wider support than Llama.cpp with Jinja chat templates and raw tokens
- Llama.cpp attempts to build a prompt from string input using community implementations of chat templates and matching via model ids. It does not always work nor does it support all models.
- Llama.cpp performs no manipulation of the input when sending just tokens, so using this crate's
get_built_prompt_as_tokens
function is safer.
- Build with generation prefixes
- Supports all local models. Even those that don't explicitly support it.
API LLM Support
- OpenAI formatted prompts (OpenAI, Anthropic, Etc.)
- Outputs System/User/Assistant keys and content strings.
Accurate Token Counts
- Accurately count prompt tokens
- Ensures a prompt is within model limits.
- Handles unique rules for API and Local models.
User friendly
- A single struct with thread safe interior mutability for ergonomics.
- Will fail gracefully via result if the prompt does not match turn ordering rules.
Save and load from file
- Serde implemented for PromptMessages
Use
This llm_models crate from the llm_client project is used here for example purposes. It is not required.
use *;
use ApiLlmModel;
use LocalLlmModel;
// OpenAI Format
let model = gpt_3_5_turbo;
let prompt = new_api_prompt;
// Chat Template
let model = default;
let prompt = new_local_prompt;
// There are three types of 'messages'
// Add system messages
prompt.add_system_message?.set_content;
// User messages
prompt.add_user_message?.set_content;
// LLM responses
prompt.add_assistant_message?.set_content;
// Builds with generation prefix. The llm will complete the response: 'Don't you think that is... cool?'
// Only Chat Template format supports this
prompt.set_generation_prefix;
// Access (and build) the underlying prompt topography
let local_prompt: &LocalPrompt = prompt.local_prompt?;
let api_prompt: &ApiPrompt = prompt.api_prompt?;
// Get chat template formatted prompt
let local_prompt_as_string: String = prompt.local_prompt?.get_built_prompt?;
let local_prompt_as_tokens: = prompt.local_prompt?.get_built_prompt_as_tokens?;
// Openai formatted prompt (Openai and Anthropic format)
let api_prompt_as_messages: = prompt.api_prompt?.get_built_prompt?;
// Get total tokens in prompt
let total_prompt_tokens: u64 = prompt.local_prompt?.get_total_prompt_tokens;
let total_prompt_tokens: u64 = prompt.api_prompt?.get_total_prompt_tokens;
// Validate requested max_tokens for a generation. If it exceeds the models limits, reduce max_tokens to a safe value
let actual_request_tokens = check_and_get_max_tokens?;
LlmPrompt
requires a tokenizer. You can use the llm_models crate's tokenizer, or implement the PromptTokenizer trait on your own tokenizer.