Struct LlamaModel

Source

pub struct LlamaModel { /* private fields */ }

Expand description

A safe wrapper around llama_model.

Implementations§

Source §

impl LlamaModel

Source

pub fn get_vocab(&self) -> LlamaVocab

Retrieves the vocabulary associated with the current Llama model.

This method fetches the vocabulary from the underlying model using an unsafe FFI call. The returned LlamaVocab struct contains a non-null pointer to the vocabulary data, which is wrapped in a NonNull for safety.

§Safety

This method uses an unsafe block to call a C function (llama_model_get_vocab), which is assumed to return a valid pointer to the vocabulary. The caller should ensure that the model object is properly initialized and valid before calling this method, as dereferencing invalid pointers can lead to undefined behavior.

§Returns

A LlamaVocab struct containing the vocabulary of the model.

§Panics

Panics if the underlying C function returns a null pointer.

§Example

let vocab = model.get_vocab();

Source

pub fn n_ctx_train(&self) -> u32

Get the number of tokens the model was trained on.

This function returns the number of tokens that the model was trained on, represented as a u32.

§Panics

This function will panic if the number of tokens the model was trained on does not fit into a u32. This should be impossible on most platforms since llama.cpp returns a c_int (i32 on most platforms), which is almost certainly positive.

Source

pub fn tokens( &self, special: Special, ) -> impl Iterator<Item = (LlamaToken, Result<String, TokenToStringError>)> + '_

Get all tokens in the model.

This function returns an iterator over all the tokens in the model. Each item in the iterator is a tuple containing a LlamaToken and its corresponding string representation (or an error if the conversion fails).

§Parameters

special: The Special value that determines how special tokens (like BOS, EOS, etc.) are handled.

Source

pub fn token_bos(&self) -> LlamaToken

Get the beginning of stream token.

This function returns the token that represents the beginning of a stream (BOS token).

Source

pub fn token_eos(&self) -> LlamaToken

Get the end of stream token.

This function returns the token that represents the end of a stream (EOS token).

Source

pub fn token_nl(&self) -> LlamaToken

Get the newline token.

This function returns the token that represents a newline character.

Source

pub fn is_eog_token(&self, token: LlamaToken) -> bool

Check if a token represents the end of generation (end of turn, end of sequence, etc.).

This function returns true if the provided token signifies the end of generation or end of sequence, such as EOS or other special tokens.

§Parameters

token: The LlamaToken to check.

§Returns

true if the token is an end-of-generation token, otherwise false.

Source

pub fn decode_start_token(&self) -> LlamaToken

Get the decoder start token.

This function returns the token used to signal the start of decoding (i.e., the token used at the start of a sequence generation).

Source

pub fn token_to_str( &self, token: LlamaToken, special: Special, ) -> Result<String, TokenToStringError>

Convert a single token to a string.

This function converts a LlamaToken into its string representation.

§Errors

This function returns an error if the token cannot be converted to a string. For more details, refer to TokenToStringError.

§Parameters

token: The LlamaToken to convert.
special: The Special value used to handle special tokens.

Source

pub fn token_to_bytes( &self, token: LlamaToken, special: Special, ) -> Result<Vec<u8>, TokenToStringError>

Convert a single token to bytes.

This function converts a LlamaToken into a byte representation.

§Errors

This function returns an error if the token cannot be converted to bytes. For more details, refer to TokenToStringError.

§Parameters

token: The LlamaToken to convert.
special: The Special value used to handle special tokens.

Source

pub fn tokens_to_str( &self, tokens: &[LlamaToken], special: Special, ) -> Result<String, TokenToStringError>

Convert a vector of tokens to a single string.

This function takes a slice of LlamaTokens and converts them into a single string, concatenating their string representations.

§Errors

This function returns an error if any token cannot be converted to a string. For more details, refer to TokenToStringError.

§Parameters

tokens: A slice of LlamaTokens to convert.
special: The Special value used to handle special tokens.

Source

pub fn str_to_token( &self, str: &str, add_bos: AddBos, ) -> Result<Vec<LlamaToken>, StringToTokenError>

Convert a string to a vector of tokens.

This function converts a string into a vector of LlamaTokens. The function will tokenize the string and return the corresponding tokens.

§Errors

This function will return an error if the input string contains a null byte.

§Panics

This function will panic if the number of tokens exceeds usize::MAX.

§Example

use llama_cpp_4::model::LlamaModel;

use std::path::Path;
use llama_cpp_4::model::AddBos;
let backend = llama_cpp_4::llama_backend::LlamaBackend::init()?;
let model = LlamaModel::load_from_file(&backend, Path::new("path/to/model"), &Default::default())?;
let tokens = model.str_to_token("Hello, World!", AddBos::Always)?;

Source

pub fn token_attr(&self, LlamaToken: LlamaToken) -> LlamaTokenAttrs

Get the type of a token.

This function retrieves the attributes associated with a given token. The attributes are typically used to understand whether the token represents a special type of token (e.g., beginning-of-sequence (BOS), end-of-sequence (EOS), control tokens, etc.).

§Panics

This function will panic if the token type is unknown or cannot be converted to a valid LlamaTokenAttrs.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaToken};

let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_attrs = model.token_attr(token);

Source

pub fn token_to_str_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, ) -> Result<String, TokenToStringError>

Convert a token to a string with a specified buffer size.

This function allows you to convert a token into a string, with the ability to specify a buffer size for the operation. It is generally recommended to use LlamaModel::token_to_str instead, as 8 bytes is typically sufficient for most tokens, and the extra buffer size doesn’t usually matter.

§Errors

If the token type is unknown, an error will be returned.
If the resultant token exceeds the provided buffer_size, an error will occur.
If the token string returned by llama-cpp is not valid UTF-8, it will return an error.

§Panics

This function will panic if the buffer_size does not fit into a c_int.
It will also panic if the size returned from llama-cpp does not fit into a usize, which should typically never happen.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaToken};

let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_string = model.token_to_str_with_size(token, 32, Special::Plaintext)?;

Source

pub fn token_to_bytes_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, lstrip: Option<NonZeroU16>, ) -> Result<Vec<u8>, TokenToStringError>

Convert a token to bytes with a specified buffer size.

Generally you should use LlamaModel::token_to_bytes instead as 8 bytes is enough for most words and the extra bytes do not really matter.

§Errors

if the token type is unknown
the resultant token is larger than buffer_size.

§Panics

This function will panic if buffer_size cannot fit into a c_int.
It will also panic if the size returned from llama-cpp cannot be converted to usize (which should not happen).

§Example

use llama_cpp_4::model::{LlamaModel, LlamaToken};

let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_bytes = model.token_to_bytes_with_size(token, 32, Special::Plaintext, None)?;

Source

pub fn n_vocab(&self) -> i32

The number of tokens the model was trained on.

This function returns the number of tokens the model was trained on. It is returned as a c_int for maximum compatibility with the underlying llama-cpp library, though it can typically be cast to an i32 without issue.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let n_vocab = model.n_vocab();

Source

pub fn vocab_type(&self) -> VocabType

The type of vocab the model was trained on.

This function returns the type of vocabulary used by the model, such as whether it is based on byte-pair encoding (BPE), word-level tokens, or another tokenization scheme.

§Panics

This function will panic if llama-cpp emits a vocab type that is not recognized or is invalid for this library.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let vocab_type = model.vocab_type();

Source

pub fn n_embd(&self) -> c_int

Returns the number of embedding dimensions for the model.

This function retrieves the number of embeddings (or embedding dimensions) used by the model. It is typically used for analyzing model architecture and setting up context parameters or other model configuration aspects.

§Panics

This function may panic if the underlying llama-cpp library returns an invalid embedding dimension value.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let n_embd = model.n_embd();

Source

pub fn get_chat_template( &self, buf_size: usize, ) -> Result<String, ChatTemplateError>

Get chat template from model.

§Errors

If the model does not have a chat template, it will return an error.
If the chat template is not a valid CString, it will return an error.

§Example

use llama_cpp_4::model::LlamaModel;

let model = LlamaModel::load_from_file("path/to/model")?;
let chat_template = model.get_chat_template(1024)?;

Source

pub fn load_from_file( _: &LlamaBackend, path: impl AsRef<Path>, params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>

Loads a model from a file.

This function loads a model from a specified file path and returns the corresponding LlamaModel instance.

§Errors

If the path cannot be converted to a string or if the model file does not exist, it will return an error.
If the model cannot be loaded (e.g., due to an invalid or corrupted model file), it will return a LlamaModelLoadError.

§Example

use llama_cpp_4::model::LlamaModel;
use std::path::Path;

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;

Source

pub fn load_from_splits( _: &LlamaBackend, paths: &[impl AsRef<Path>], params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>

Load a model from multiple split files.

This function loads a model that has been split across multiple files. This is useful for very large models that exceed filesystem limitations or need to be distributed across multiple storage devices.

§Arguments

paths - A slice of paths to the split model files
params - The model parameters

§Errors

Returns an error if:

Any of the paths cannot be converted to a C string
The model fails to load from the splits
Any path doesn’t exist or isn’t accessible

§Example

use llama_cpp_4::model::{LlamaModel, params::LlamaModelParams};
use llama_cpp_4::llama_backend::LlamaBackend;

let backend = LlamaBackend::init()?;
let params = LlamaModelParams::default();

let paths = vec![
    "model-00001-of-00003.gguf",
    "model-00002-of-00003.gguf",
    "model-00003-of-00003.gguf",
];

let model = LlamaModel::load_from_splits(&backend, &paths, &params)?;

Source

pub fn lora_adapter_init( &self, path: impl AsRef<Path>, ) -> Result<LlamaLoraAdapter, LlamaLoraAdapterInitError>

Initializes a lora adapter from a file.

This function initializes a Lora adapter, which is a model extension used to adapt or fine-tune the existing model to a specific domain or task. The adapter file is typically in the form of a binary or serialized file that can be applied to the model for improved performance on specialized tasks.

§Errors

If the adapter file path cannot be converted to a string or if the adapter cannot be initialized, it will return an error.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaLoraAdapter};
use std::path::Path;

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let adapter = model.lora_adapter_init("path/to/lora/adapter")?;

Source

pub fn new_context( &self, _: &LlamaBackend, params: LlamaContextParams, ) -> Result<LlamaContext<'_>, LlamaContextLoadError>

Create a new context from this model.

This function creates a new context for the model, which is used to manage and perform computations for inference, including token generation, embeddings, and other tasks that the model can perform. The context allows fine-grained control over model parameters for a specific task.

§Errors

There are various potential failures such as invalid parameters or a failure to allocate the context. See LlamaContextLoadError for more detailed error descriptions.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaContext};
use llama_cpp_4::LlamaContextParams;

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let context = model.new_context(&LlamaBackend::init()?, LlamaContextParams::default())?;

Source

pub fn apply_chat_template( &self, tmpl: Option<&str>, chat: &[LlamaChatMessage], add_ass: bool, ) -> Result<String, ApplyChatTemplateError>

Apply the model’s chat template to a sequence of messages.

This function applies the model’s chat template to the provided chat messages, formatting them accordingly. The chat template determines the structure or style of conversation between the system and user, such as token formatting, role separation, and more. The template can be customized by providing an optional template string, or if None is provided, the default template used by llama.cpp will be applied.

For more information on supported templates, visit: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

§Arguments

tmpl: An optional custom template string. If None, the default template will be used.
chat: A vector of LlamaChatMessage instances, which represent the conversation between the system and user.
add_ass: A boolean flag indicating whether additional system-specific instructions (like “assistant”) should be included.

§Errors

There are several possible points of failure when applying the chat template:

Insufficient buffer size to hold the formatted chat (this will return ApplyChatTemplateError::BuffSizeError).
If the template or messages cannot be processed properly, various errors from ApplyChatTemplateError may occur.

§Example

use llama_cpp_4::model::{LlamaModel, LlamaChatMessage};

let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let chat = vec![
    LlamaChatMessage::new("user", "Hello!"),
    LlamaChatMessage::new("assistant", "Hi! How can I assist you today?"),
];
let formatted_chat = model.apply_chat_template(None, chat, true)?;

§Notes

The provided buffer is twice the length of the messages by default, which is recommended by the llama.cpp documentation.

§Panics

Panics if the buffer length exceeds i32::MAX.

Source

pub fn split_path(path_prefix: &str, split_no: i32, split_count: i32) -> String

Build a split GGUF file path for a specific chunk.

This utility function creates the standardized filename for a split model chunk following the pattern: {prefix}-{split_no:05d}-of-{split_count:05d}.gguf

§Arguments

path_prefix - The base path and filename prefix
split_no - The split number (1-indexed)
split_count - The total number of splits

§Returns

Returns the formatted split path as a String

§Example

use llama_cpp_4::model::LlamaModel;

let path = LlamaModel::split_path("/models/llama", 2, 4);
assert_eq!(path, "/models/llama-00002-of-00004.gguf");

§Panics

Panics if the path prefix contains a null byte.

Source

pub fn split_prefix( split_path: &str, split_no: i32, split_count: i32, ) -> Option<String>

Extract the path prefix from a split filename.

This function extracts the base path prefix from a split model filename, but only if the split_no and split_count match the pattern in the filename.

§Arguments

split_path - The full path to the split file
split_no - The expected split number
split_count - The expected total number of splits

§Returns

Returns the path prefix if the pattern matches, or None if it doesn’t

§Example

use llama_cpp_4::model::LlamaModel;

let prefix = LlamaModel::split_prefix("/models/llama-00002-of-00004.gguf", 2, 4);
assert_eq!(prefix, Some("/models/llama".to_string()));