Struct LlamaModel

Source

pub struct LlamaModel { /* private fields */ }

Expand description

A safe wrapper around llama_model.

Implementations§

Source §

impl LlamaModel

Source

pub fn get_vocab(&self) -> LlamaVocab

Retrieves the vocabulary associated with the current Llama model.

This method fetches the vocabulary from the underlying model using an unsafe FFI call. The returned LlamaVocab struct contains a non-null pointer to the vocabulary data, which is wrapped in a NonNull for safety.

§Safety

This method uses an unsafe block to call a C function (llama_model_get_vocab), which is assumed to return a valid pointer to the vocabulary. The caller should ensure that the model object is properly initialized and valid before calling this method, as dereferencing invalid pointers can lead to undefined behavior.

§Returns

A LlamaVocab struct containing the vocabulary of the model.

§Panics

Panics if the underlying C function returns a null pointer.

§Example

let vocab = model.get_vocab();

Source

pub fn n_ctx_train(&self) -> u32

Get the number of tokens the model was trained on.

This function returns the number of tokens that the model was trained on, represented as a u32.

§Panics

This function will panic if the number of tokens the model was trained on does not fit into a u32. This should be impossible on most platforms since llama.cpp returns a c_int (i32 on most platforms), which is almost certainly positive.

Source

pub fn tokens( &self, special: Special, ) -> impl Iterator<Item = (LlamaToken, Result<String, TokenToStringError>)> + '_

Get all tokens in the model.

This function returns an iterator over all the tokens in the model. Each item in the iterator is a tuple containing a LlamaToken and its corresponding string representation (or an error if the conversion fails).

§Parameters

special: The Special value that determines how special tokens (like BOS, EOS, etc.) are handled.

Source

pub fn token_bos(&self) -> LlamaToken

Get the beginning of stream token.

This function returns the token that represents the beginning of a stream (BOS token).

Source

pub fn token_eos(&self) -> LlamaToken

Get the end of stream token.

This function returns the token that represents the end of a stream (EOS token).

Source

pub fn token_nl(&self) -> LlamaToken

Get the newline token.

This function returns the token that represents a newline character.

Source

pub fn is_eog_token(&self, token: LlamaToken) -> bool

Check if a token represents the end of generation (end of turn, end of sequence, etc.).

This function returns true if the provided token signifies the end of generation or end of sequence, such as EOS or other special tokens.

§Parameters

token: The LlamaToken to check.

§Returns

true if the token is an end-of-generation token, otherwise false.

Source

pub fn token_cls(&self) -> LlamaToken

Get the classification token.

Source

pub fn token_eot(&self) -> LlamaToken

Get the end-of-turn token.

Source

pub fn token_pad(&self) -> LlamaToken

Get the padding token.

Source

pub fn token_sep(&self) -> LlamaToken

Get the separator token.

Source

pub fn token_fim_pre(&self) -> LlamaToken

Get the fill-in-the-middle prefix token.

Source

pub fn token_fim_suf(&self) -> LlamaToken

Get the fill-in-the-middle suffix token.

Source

pub fn token_fim_mid(&self) -> LlamaToken

Get the fill-in-the-middle middle token.

Source

pub fn token_fim_pad(&self) -> LlamaToken

Get the fill-in-the-middle padding token.

Source

pub fn token_fim_rep(&self) -> LlamaToken

Get the fill-in-the-middle repository token.

Source

pub fn token_fim_sep(&self) -> LlamaToken

Get the fill-in-the-middle separator token.

Source

pub fn token_is_control(&self, token: LlamaToken) -> bool

Check if a token is a control token.

Source

pub fn token_get_score(&self, token: LlamaToken) -> f32

Get the score of a token.

Source

pub fn token_get_text( &self, token: LlamaToken, ) -> Result<&str, StringFromModelError>

Get the raw text of a token.

§Errors

Returns an error if the token text is null or not valid UTF-8.

Source

pub fn add_bos_token(&self) -> bool

Check if a BOS token should be added when tokenizing.

Source

pub fn add_eos_token(&self) -> bool

Check if an EOS token should be added when tokenizing.

Source

pub fn decode_start_token(&self) -> LlamaToken

Get the decoder start token.

This function returns the token used to signal the start of decoding (i.e., the token used at the start of a sequence generation).

Source

pub fn token_to_str( &self, token: LlamaToken, special: Special, ) -> Result<String, TokenToStringError>

Convert a single token to a string.

This function converts a LlamaToken into its string representation.

§Errors

This function returns an error if the token cannot be converted to a string. For more details, refer to TokenToStringError.

§Parameters

token: The LlamaToken to convert.
special: The Special value used to handle special tokens.

Source

pub fn token_to_bytes( &self, token: LlamaToken, special: Special, ) -> Result<Vec<u8>, TokenToStringError>

Convert a single token to bytes.

This function converts a LlamaToken into a byte representation.

§Errors

This function returns an error if the token cannot be converted to bytes. For more details, refer to TokenToStringError.

§Parameters

token: The LlamaToken to convert.
special: The Special value used to handle special tokens.

Source

pub fn tokens_to_str( &self, tokens: &[LlamaToken], special: Special, ) -> Result<String, TokenToStringError>

Convert a vector of tokens to a single string.

This function takes a slice of LlamaTokens and converts them into a single string, concatenating their string representations.

§Errors

This function returns an error if any token cannot be converted to a string. For more details, refer to TokenToStringError.

§Parameters

tokens: A slice of LlamaTokens to convert.
special: The Special value used to handle special tokens.

Source

pub fn str_to_token( &self, str: &str, add_bos: AddBos, ) -> Result<Vec<LlamaToken>, StringToTokenError>

Convert a string to a vector of tokens.

This function converts a string into a vector of LlamaTokens. The function will tokenize the string and return the corresponding tokens.

§Errors

This function will return an error if the input string contains a null byte.

§Panics

This function will panic if the number of tokens exceeds usize::MAX.

§Example

use llama_cpp_4::model::LlamaModel;

use std::path::Path;
use llama_cpp_4::model::AddBos;
let backend = llama_cpp_4::llama_backend::LlamaBackend::init()?;
let model = LlamaModel::load_from_file(&backend, Path::new("path/to/model"), &Default::default())?;
let tokens = model.str_to_token("Hello, World!", AddBos::Always)?;

Source

pub fn token_attr(&self, LlamaToken: LlamaToken) -> LlamaTokenAttrs

Get the type of a token.

This function retrieves the attributes associated with a given token. The attributes are typically used to understand whether the token represents a special type of token (e.g., beginning-of-sequence (BOS), end-of-sequence (EOS), control tokens, etc.).

§Panics

This function will panic if the token type is unknown or cannot be converted to a valid LlamaTokenAttrs.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaToken};

let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_attrs = model.token_attr(token);

Source

pub fn detokenize( &self, tokens: &[LlamaToken], remove_special: bool, unparse_special: bool, ) -> Result<String, StringFromModelError>

Detokenize a slice of tokens into a string.

This is the inverse of str_to_token.

§Parameters

tokens: The tokens to detokenize.
remove_special: If true, special tokens are removed from the output.
unparse_special: If true, special tokens are rendered as their text representation.

§Errors

Returns an error if the detokenized text is not valid UTF-8.

Source

pub fn token_to_str_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, ) -> Result<String, TokenToStringError>

Convert a token to a string with a specified buffer size.

This function allows you to convert a token into a string, with the ability to specify a buffer size for the operation. It is generally recommended to use LlamaModel::token_to_str instead, as 8 bytes is typically sufficient for most tokens, and the extra buffer size doesn’t usually matter.

§Errors

If the token type is unknown, an error will be returned.
If the resultant token exceeds the provided buffer_size, an error will occur.
If the token string returned by llama-cpp is not valid UTF-8, it will return an error.

§Panics

This function will panic if the buffer_size does not fit into a c_int.
It will also panic if the size returned from llama-cpp does not fit into a usize, which should typically never happen.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaToken};

let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_string = model.token_to_str_with_size(token, 32, Special::Plaintext)?;

Source

pub fn token_to_bytes_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, lstrip: Option<NonZeroU16>, ) -> Result<Vec<u8>, TokenToStringError>

Convert a token to bytes with a specified buffer size.

Generally you should use LlamaModel::token_to_bytes instead as 8 bytes is enough for most words and the extra bytes do not really matter.

§Errors

if the token type is unknown
the resultant token is larger than buffer_size.

§Panics

This function will panic if buffer_size cannot fit into a c_int.
It will also panic if the size returned from llama-cpp cannot be converted to usize (which should not happen).

§Example

use llama_cpp_4::model::{LlamaModel, LlamaToken};

let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_bytes = model.token_to_bytes_with_size(token, 32, Special::Plaintext, None)?;

Source

pub fn n_vocab(&self) -> i32

The number of tokens the model was trained on.

This function returns the number of tokens the model was trained on. It is returned as a c_int for maximum compatibility with the underlying llama-cpp library, though it can typically be cast to an i32 without issue.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let n_vocab = model.n_vocab();

Source

pub fn vocab_type(&self) -> VocabType

The type of vocab the model was trained on.

This function returns the type of vocabulary used by the model, such as whether it is based on byte-pair encoding (BPE), word-level tokens, or another tokenization scheme.

§Panics

This function will panic if llama-cpp emits a vocab type that is not recognized or is invalid for this library.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let vocab_type = model.vocab_type();

Source

pub fn n_embd(&self) -> c_int

Returns the number of embedding dimensions for the model.

This function retrieves the number of embeddings (or embedding dimensions) used by the model. It is typically used for analyzing model architecture and setting up context parameters or other model configuration aspects.

§Panics

This function may panic if the underlying llama-cpp library returns an invalid embedding dimension value.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let n_embd = model.n_embd();

Source

pub fn n_layer(&self) -> c_int

Get the number of transformer layers in the model.

Source

pub fn n_head(&self) -> c_int

Get the number of attention heads in the model.

Source

pub fn n_head_kv(&self) -> c_int

Get the number of key-value attention heads in the model.

Source

pub fn n_embd_inp(&self) -> c_int

Get the input embedding size of the model.

Source

pub fn n_embd_out(&self) -> c_int

Get the output embedding size of the model.

Source

pub fn n_swa(&self) -> c_int

Get the sliding window attention size of the model. Returns 0 if the model does not use sliding window attention.

Source

pub fn rope_type(&self) -> i32

Get the RoPE type used by the model.

Source

pub fn rope_freq_scale_train(&self) -> f32

Get the RoPE frequency scale used during training.

Source

pub fn model_size(&self) -> u64

Get the model size in bytes.

Source

pub fn n_params(&self) -> u64

Get the number of parameters in the model.

Source

pub fn n_cls_out(&self) -> u32

Get the number of classification outputs.

Source

pub fn cls_label(&self, index: u32) -> Result<&str, StringFromModelError>

Get the classification label for the given index.

§Errors

Returns an error if the label is null or not valid UTF-8.

Source

pub fn meta_count(&self) -> c_int

Get the number of metadata key-value pairs.

Source

pub fn desc(&self, buf_size: usize) -> Result<String, StringFromModelError>

Get a model description string.

The buf_size parameter specifies the maximum buffer size for the description. A default of 256 bytes is usually sufficient.

§Errors

Returns an error if the description could not be retrieved or is not valid UTF-8.

Source

pub fn meta_key_by_index( &self, index: i32, buf_size: usize, ) -> Result<String, StringFromModelError>

Get a metadata key by index.

The buf_size parameter specifies the maximum buffer size for the key. A default of 256 bytes is usually sufficient.

§Errors

Returns an error if the index is out of range or the key is not valid UTF-8.

Source

pub fn meta_val_str_by_index( &self, index: i32, buf_size: usize, ) -> Result<String, StringFromModelError>

Get a metadata value string by index.

The buf_size parameter specifies the maximum buffer size for the value. Values can be large (e.g. chat templates, token lists), so 4096+ may be needed.

§Errors

Returns an error if the index is out of range or the value is not valid UTF-8.

Source

pub fn meta_val_str( &self, key: &str, buf_size: usize, ) -> Result<String, StringFromModelError>

Get a metadata value by key name.

This is more convenient than iterating metadata by index when you know the key. The buf_size parameter specifies the maximum buffer size for the value.

§Errors

Returns an error if the key is not found, contains a null byte, or the value is not valid UTF-8.

Source

pub fn metadata(&self) -> Result<Vec<(String, String)>, StringFromModelError>

Get all metadata as a list of (key, value) pairs.

This is a convenience method that iterates over all metadata entries. Keys use a buffer of 256 bytes and values use 4096 bytes. For values that may be larger (e.g. token lists), use meta_val_str_by_index directly with a larger buffer.

§Errors

Returns an error if any key or value cannot be read or is not valid UTF-8.

Source

pub fn has_encoder(&self) -> bool

Check if the model has an encoder.

Source

pub fn has_decoder(&self) -> bool

Check if the model has a decoder.

Source

pub fn is_recurrent(&self) -> bool

Check if the model is recurrent (e.g. Mamba, RWKV).

Source

pub fn is_hybrid(&self) -> bool

Check if the model is a hybrid model.

Source

pub fn is_diffusion(&self) -> bool

Check if the model is a diffusion model.

Source

pub fn get_chat_template( &self, buf_size: usize, ) -> Result<String, ChatTemplateError>

Get chat template from model.

§Errors

If the model does not have a chat template, it will return an error.
If the chat template is not a valid CString, it will return an error.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let chat_template = model.get_chat_template(1024)?;

Source

pub fn load_from_file( _: &LlamaBackend, path: impl AsRef<Path>, params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>

Loads a model from a file.

This function loads a model from a specified file path and returns the corresponding LlamaModel instance.

§Errors

If the path cannot be converted to a string or if the model file does not exist, it will return an error.
If the model cannot be loaded (e.g., due to an invalid or corrupted model file), it will return a LlamaModelLoadError.

§Example

use llama_cpp_4::model::LlamaModel;
use std::path::Path;

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;

Source

pub fn load_from_splits( _: &LlamaBackend, paths: &[impl AsRef<Path>], params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>

Load a model from multiple split files.

This function loads a model that has been split across multiple files. This is useful for very large models that exceed filesystem limitations or need to be distributed across multiple storage devices.

§Arguments

paths - A slice of paths to the split model files
params - The model parameters

§Errors

Returns an error if:

Any of the paths cannot be converted to a C string
The model fails to load from the splits
Any path doesn’t exist or isn’t accessible

§Example

use llama_cpp_4::model::{LlamaModel, params::LlamaModelParams};
use llama_cpp_4::llama_backend::LlamaBackend;

let backend = LlamaBackend::init()?;
let params = LlamaModelParams::default();

let paths = vec![
    "model-00001-of-00003.gguf",
    "model-00002-of-00003.gguf",
    "model-00003-of-00003.gguf",
];

let model = LlamaModel::load_from_splits(&backend, &paths, &params)?;

Source

pub unsafe fn load_from_file_ptr( file: *mut FILE, params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>

Load a model from a FILE pointer.

§Safety

The file pointer must be a valid, open FILE*.

§Errors

Returns an error if the model cannot be loaded.

Source

pub unsafe fn init_from_user( metadata: mut gguf_context, set_tensor_data: llama_model_set_tensor_data_t, set_tensor_data_ud: mut c_void, params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>

Initialize a model from user-provided data.

§Safety

The metadata, callback, and user data must be valid.

§Errors

Returns an error if the model cannot be initialized.

Source

pub fn save_to_file(&self, path: impl AsRef<Path>)

Save the model to a file.

§Panics

Panics if the path contains null bytes.

Source

pub fn chat_builtin_templates() -> Vec<String>

Get the list of built-in chat templates.

Returns the names of all chat templates that are built into llama.cpp.

§Panics

Panics if any template name is not valid UTF-8.

Source

pub fn lora_adapter_init( &self, path: impl AsRef<Path>, ) -> Result<LlamaLoraAdapter, LlamaLoraAdapterInitError>

Initializes a lora adapter from a file.

This function initializes a Lora adapter, which is a model extension used to adapt or fine-tune the existing model to a specific domain or task. The adapter file is typically in the form of a binary or serialized file that can be applied to the model for improved performance on specialized tasks.

§Errors

If the adapter file path cannot be converted to a string or if the adapter cannot be initialized, it will return an error.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaLoraAdapter};
use std::path::Path;

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let adapter = model.lora_adapter_init("path/to/lora/adapter")?;

Source

pub fn new_context( &self, _: &LlamaBackend, params: LlamaContextParams, ) -> Result<LlamaContext<'_>, LlamaContextLoadError>

Create a new context from this model.

This function creates a new context for the model, which is used to manage and perform computations for inference, including token generation, embeddings, and other tasks that the model can perform. The context allows fine-grained control over model parameters for a specific task.

§Errors

There are various potential failures such as invalid parameters or a failure to allocate the context. See LlamaContextLoadError for more detailed error descriptions.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaContext};
use llama_cpp_4::LlamaContextParams;

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let context = model.new_context(&LlamaBackend::init()?, LlamaContextParams::default())?;

Source

pub fn apply_chat_template( &self, tmpl: Option<&str>, chat: &[LlamaChatMessage], add_ass: bool, ) -> Result<String, ApplyChatTemplateError>

Apply the model’s chat template to a sequence of messages.

This function applies the model’s chat template to the provided chat messages, formatting them accordingly. The chat template determines the structure or style of conversation between the system and user, such as token formatting, role separation, and more. The template can be customized by providing an optional template string, or if None is provided, the default template used by llama.cpp will be applied.

For more information on supported templates, visit: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

§Arguments

tmpl: An optional custom template string. If None, the default template will be used.
chat: A vector of LlamaChatMessage instances, which represent the conversation between the system and user.
add_ass: A boolean flag indicating whether additional system-specific instructions (like “assistant”) should be included.

§Errors

There are several possible points of failure when applying the chat template:

Insufficient buffer size to hold the formatted chat (this will return ApplyChatTemplateError::BuffSizeError).
If the template or messages cannot be processed properly, various errors from ApplyChatTemplateError may occur.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaChatMessage};

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let chat = vec![
    LlamaChatMessage::new("user", "Hello!"),
    LlamaChatMessage::new("assistant", "Hi! How can I assist you today?"),
];
let formatted_chat = model.apply_chat_template(None, chat, true)?;

§Notes

The provided buffer is twice the length of the messages by default, which is recommended by the llama.cpp documentation.

§Panics

Panics if the buffer length exceeds i32::MAX.

Source

pub fn split_path(path_prefix: &str, split_no: i32, split_count: i32) -> String

Build a split GGUF file path for a specific chunk.

This utility function creates the standardized filename for a split model chunk following the pattern: {prefix}-{split_no:05d}-of-{split_count:05d}.gguf

§Arguments

path_prefix - The base path and filename prefix
split_no - The split number (1-indexed)
split_count - The total number of splits

§Returns

Returns the formatted split path as a String

§Example

use llama_cpp_4::model::LlamaModel;

let path = LlamaModel::split_path("/models/llama", 2, 4);
assert_eq!(path, "/models/llama-00002-of-00004.gguf");

§Panics

Panics if the path prefix contains a null byte.

Source

pub fn split_prefix( split_path: &str, split_no: i32, split_count: i32, ) -> Option<String>

Extract the path prefix from a split filename.

This function extracts the base path prefix from a split model filename, but only if the split_no and split_count match the pattern in the filename.

§Arguments

split_path - The full path to the split file
split_no - The expected split number
split_count - The expected total number of splits

§Returns

Returns the path prefix if the pattern matches, or None if it doesn’t

§Example

use llama_cpp_4::model::LlamaModel;

let prefix = LlamaModel::split_prefix("/models/llama-00002-of-00004.gguf", 2, 4);
assert_eq!(prefix, Some("/models/llama".to_string()));