Crate llm_prompt

Source

Expand description

§llm_prompt: Low Level Prompt System for API LLMs and local LLMs

The llm_prompt crate is a workspace member of the llm_client project.

§Features

§Local LLM Support

Uses the LLM’s chat template to properly format the prompt
Wider support than Llama.cpp with Jinja chat templates and raw tokens:
- Llama.cpp attempts to build a prompt from string input using community implementations of chat templates and matching via model ids. It does not always work nor does it support all models
- Llama.cpp performs no manipulation of the input when sending just tokens, so using this crate’s get_built_prompt_as_tokens function is safer
Build with generation prefixes that support all local models, even those that don’t explicitly support it

§API LLM Support

OpenAI formatted prompts (OpenAI, Anthropic, etc.)
Outputs System/User/Assistant keys and content strings

§Accurate Token Counts

Accurately count prompt tokens to ensure prompts stay within model limits
Handles unique rules for both API and Local models

§User Friendly

A single struct with thread safe interior mutability for ergonomics
Fails gracefully via Result if the prompt does not match turn ordering rules

§Serialization Support

Serde implemented for PromptMessages enabling save/load from file

§Use

The llm_models crate from the llm_client project is used here for example purposes. It is not required.

use llm_prompt::*;
use llm_models::api_model::ApiLlmModel;
use llm_models::local_model::LocalLlmModel;

// OpenAI Format
let model = ApiLlmModel::gpt_3_5_turbo();
let prompt = LlmPrompt::new_api_prompt(
    model.model_base.tokenizer.clone(),
    Some(model.tokens_per_message),
    model.tokens_per_name,
);

// Chat Template
let model = LocalLlmModel::default();
let prompt = LlmPrompt::new_local_prompt(
    model.model_base.tokenizer.clone(),
    &model.chat_template.chat_template,
    model.chat_template.bos_token.as_deref(),
    &model.chat_template.eos_token,
    model.chat_template.unk_token.as_deref(),
    model.chat_template.base_generation_prefix.as_deref(),
);
// There are three types of 'messages'

// Add system messages
prompt.add_system_message()?.set_content("You are a nice robot");

// User messages
prompt.add_user_message()?.set_content("Hello");

// LLM responses
prompt.add_assistant_message()?.set_content("Well, how do you do?");

// Builds with generation prefix. The llm will complete the response: 'Don't you think that is... cool?'
// Only Chat Template format supports this
prompt.set_generation_prefix("Don't you think that is...");

// Access (and build) the underlying prompt topography
let local_prompt: &LocalPrompt = prompt.local_prompt()?;
let api_prompt: &ApiPrompt = prompt.api_prompt()?;

// Get chat template formatted prompt
let local_prompt_as_string: String = prompt.local_prompt()?.get_built_prompt()?;
let local_prompt_as_tokens: Vec<u32> = prompt.local_prompt()?.get_built_prompt_as_tokens()?;

// Openai formatted prompt (Openai and Anthropic format)
let api_prompt_as_messages: Vec<HashMap<String, String>> = prompt.api_prompt()?.get_built_prompt()?;

// Get total tokens in prompt
let total_prompt_tokens: u64 = prompt.local_prompt()?.get_total_prompt_tokens();
let total_prompt_tokens: u64 = prompt.api_prompt()?.get_total_prompt_tokens();

// Validate requested max_tokens for a generation. If it exceeds the models limits, reduce max_tokens to a safe value
let actual_request_tokens = check_and_get_max_tokens(
    model.context_length,
    Some(model.max_tokens_output), // If using a GGUF model use either model.context_length or the ctx_size of the server
    total_prompt_tokens,
    Some(10), // Safety tokens
    requested_max_tokens,
)?;

LlmPrompt requires a tokenizer. You can use the llm_models crate’s tokenizer, or implement the PromptTokenizer trait on your own tokenizer.

impl PromptTokenizer for LlmTokenizer {
    fn tokenize(&self, input: &str) -> Vec<u32> {
        self.tokenize(input)
    }

    fn count_tokens(&self, str: &str) -> u32 {
        self.count_tokens(str)
    }
}

Structs§

ApiPrompt: A prompt formatter for API-based language models that follow OpenAI’s message format.
LlmPrompt: A prompt management system that supports both API-based LLMs (like OpenAI) and local LLMs.
LocalPrompt: A prompt formatter for local LLMs that use chat templates.
MaxTokenState
PromptMessage: An individual message within a prompt sequence.
PromptMessages: A collection of prompt messages with thread-safe mutability.

Enums§

PromptMessageType: Represents the type of message in a prompt sequence.
RequestTokenLimitError
TextConcatenator: Controls how text segments are joined together in prompt messages.

Traits§

PromptTokenizer: A trait for tokenizers that can be used with the prompt management system.
TextConcatenatorTrait: Provides methods for managing text concatenation behavior.

Functions§

apply_chat_template: Applies a chat template to a message, given a message and a chat template.
check_and_get_max_tokens: Sets and validates the ‘max_tokens’ or ‘n_ctx’ or ‘n_predict’ parameter for a request. First, it checks that the total_prompt_tokens is less than the ctx_size - safety_tokens. Then returns ‘available_tokens’ as the lower of either: ctx_size - total_prompt_tokens - safety_tokens or if it’s provided, inference_ctx_size. If ‘requested_tokens’ is provided, ‘requested_tokens’ is returned if less than ‘available_tokens’. If ‘requested_tokens’ is ‘None’ or ‘requested_tokens’ is greater than ‘available_tokens’, ‘available_tokens’ is returned.

Crate llm_promptCopy item path