pub struct LlamaModel { /* private fields */ }Expand description
A safe wrapper around llama_model.
Implementations§
Source§impl LlamaModel
impl LlamaModel
Sourcepub fn get_vocab(&self) -> LlamaVocab
pub fn get_vocab(&self) -> LlamaVocab
Retrieves the vocabulary associated with the current Llama model.
This method fetches the vocabulary from the underlying model using an unsafe
FFI call. The returned LlamaVocab struct contains a non-null pointer to
the vocabulary data, which is wrapped in a NonNull for safety.
§Safety
This method uses an unsafe block to call a C function (llama_model_get_vocab),
which is assumed to return a valid pointer to the vocabulary. The caller should
ensure that the model object is properly initialized and valid before calling
this method, as dereferencing invalid pointers can lead to undefined behavior.
§Returns
A LlamaVocab struct containing the vocabulary of the model.
§Panics
Panics if the underlying C function returns a null pointer.
§Example
let vocab = model.get_vocab();Sourcepub fn n_ctx_train(&self) -> u32
pub fn n_ctx_train(&self) -> u32
Get the number of tokens the model was trained on.
This function returns the number of tokens that the model was trained on, represented as a u32.
§Panics
This function will panic if the number of tokens the model was trained on does not fit into a u32.
This should be impossible on most platforms since llama.cpp returns a c_int (i32 on most platforms),
which is almost certainly positive.
Sourcepub fn tokens(
&self,
special: Special,
) -> impl Iterator<Item = (LlamaToken, Result<String, TokenToStringError>)> + '_
pub fn tokens( &self, special: Special, ) -> impl Iterator<Item = (LlamaToken, Result<String, TokenToStringError>)> + '_
Get all tokens in the model.
This function returns an iterator over all the tokens in the model. Each item in the iterator is a tuple
containing a LlamaToken and its corresponding string representation (or an error if the conversion fails).
§Parameters
special: TheSpecialvalue that determines how special tokens (like BOS, EOS, etc.) are handled.
Sourcepub fn token_bos(&self) -> LlamaToken
pub fn token_bos(&self) -> LlamaToken
Get the beginning of stream token.
This function returns the token that represents the beginning of a stream (BOS token).
Sourcepub fn token_eos(&self) -> LlamaToken
pub fn token_eos(&self) -> LlamaToken
Get the end of stream token.
This function returns the token that represents the end of a stream (EOS token).
Sourcepub fn token_nl(&self) -> LlamaToken
pub fn token_nl(&self) -> LlamaToken
Get the newline token.
This function returns the token that represents a newline character.
Sourcepub fn is_eog_token(&self, token: LlamaToken) -> bool
pub fn is_eog_token(&self, token: LlamaToken) -> bool
Check if a token represents the end of generation (end of turn, end of sequence, etc.).
This function returns true if the provided token signifies the end of generation or end of sequence,
such as EOS or other special tokens.
§Parameters
token: TheLlamaTokento check.
§Returns
trueif the token is an end-of-generation token, otherwisefalse.
Sourcepub fn decode_start_token(&self) -> LlamaToken
pub fn decode_start_token(&self) -> LlamaToken
Get the decoder start token.
This function returns the token used to signal the start of decoding (i.e., the token used at the start of a sequence generation).
Sourcepub fn token_to_str(
&self,
token: LlamaToken,
special: Special,
) -> Result<String, TokenToStringError>
pub fn token_to_str( &self, token: LlamaToken, special: Special, ) -> Result<String, TokenToStringError>
Convert a single token to a string.
This function converts a LlamaToken into its string representation.
§Errors
This function returns an error if the token cannot be converted to a string. For more details, refer to
TokenToStringError.
§Parameters
token: TheLlamaTokento convert.special: TheSpecialvalue used to handle special tokens.
Sourcepub fn token_to_bytes(
&self,
token: LlamaToken,
special: Special,
) -> Result<Vec<u8>, TokenToStringError>
pub fn token_to_bytes( &self, token: LlamaToken, special: Special, ) -> Result<Vec<u8>, TokenToStringError>
Convert a single token to bytes.
This function converts a LlamaToken into a byte representation.
§Errors
This function returns an error if the token cannot be converted to bytes. For more details, refer to
TokenToStringError.
§Parameters
token: TheLlamaTokento convert.special: TheSpecialvalue used to handle special tokens.
Sourcepub fn tokens_to_str(
&self,
tokens: &[LlamaToken],
special: Special,
) -> Result<String, TokenToStringError>
pub fn tokens_to_str( &self, tokens: &[LlamaToken], special: Special, ) -> Result<String, TokenToStringError>
Convert a vector of tokens to a single string.
This function takes a slice of LlamaTokens and converts them into a single string, concatenating their
string representations.
§Errors
This function returns an error if any token cannot be converted to a string. For more details, refer to
TokenToStringError.
§Parameters
tokens: A slice ofLlamaTokens to convert.special: TheSpecialvalue used to handle special tokens.
Sourcepub fn str_to_token(
&self,
str: &str,
add_bos: AddBos,
) -> Result<Vec<LlamaToken>, StringToTokenError>
pub fn str_to_token( &self, str: &str, add_bos: AddBos, ) -> Result<Vec<LlamaToken>, StringToTokenError>
Convert a string to a vector of tokens.
This function converts a string into a vector of LlamaTokens. The function will tokenize the string
and return the corresponding tokens.
§Errors
- This function will return an error if the input string contains a null byte.
§Panics
- This function will panic if the number of tokens exceeds
usize::MAX.
§Example
use llama_cpp_4::model::LlamaModel;
use std::path::Path;
use llama_cpp_4::model::AddBos;
let backend = llama_cpp_4::llama_backend::LlamaBackend::init()?;
let model = LlamaModel::load_from_file(&backend, Path::new("path/to/model"), &Default::default())?;
let tokens = model.str_to_token("Hello, World!", AddBos::Always)?;Sourcepub fn token_attr(&self, LlamaToken: LlamaToken) -> LlamaTokenAttrs
pub fn token_attr(&self, LlamaToken: LlamaToken) -> LlamaTokenAttrs
Get the type of a token.
This function retrieves the attributes associated with a given token. The attributes are typically used to understand whether the token represents a special type of token (e.g., beginning-of-sequence (BOS), end-of-sequence (EOS), control tokens, etc.).
§Panics
- This function will panic if the token type is unknown or cannot be converted to a valid
LlamaTokenAttrs.
§Example
use llama_cpp_4::model::{LlamaModel, LlamaToken};
let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_attrs = model.token_attr(token);Sourcepub fn token_to_str_with_size(
&self,
token: LlamaToken,
buffer_size: usize,
special: Special,
) -> Result<String, TokenToStringError>
pub fn token_to_str_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, ) -> Result<String, TokenToStringError>
Convert a token to a string with a specified buffer size.
This function allows you to convert a token into a string, with the ability to specify a buffer size for the operation.
It is generally recommended to use LlamaModel::token_to_str instead, as 8 bytes is typically sufficient for most tokens,
and the extra buffer size doesn’t usually matter.
§Errors
- If the token type is unknown, an error will be returned.
- If the resultant token exceeds the provided
buffer_size, an error will occur. - If the token string returned by
llama-cppis not valid UTF-8, it will return an error.
§Panics
- This function will panic if the
buffer_sizedoes not fit into ac_int. - It will also panic if the size returned from
llama-cppdoes not fit into ausize, which should typically never happen.
§Example
use llama_cpp_4::model::{LlamaModel, LlamaToken};
let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_string = model.token_to_str_with_size(token, 32, Special::Plaintext)?;Sourcepub fn token_to_bytes_with_size(
&self,
token: LlamaToken,
buffer_size: usize,
special: Special,
lstrip: Option<NonZeroU16>,
) -> Result<Vec<u8>, TokenToStringError>
pub fn token_to_bytes_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, lstrip: Option<NonZeroU16>, ) -> Result<Vec<u8>, TokenToStringError>
Convert a token to bytes with a specified buffer size.
Generally you should use LlamaModel::token_to_bytes instead as 8 bytes is enough for most words and
the extra bytes do not really matter.
§Errors
- if the token type is unknown
- the resultant token is larger than
buffer_size.
§Panics
- This function will panic if
buffer_sizecannot fit into ac_int. - It will also panic if the size returned from
llama-cppcannot be converted tousize(which should not happen).
§Example
use llama_cpp_4::model::{LlamaModel, LlamaToken};
let model = LlamaModel::load_from_file("path/to/model")?;
let token = LlamaToken(42);
let token_bytes = model.token_to_bytes_with_size(token, 32, Special::Plaintext, None)?;Sourcepub fn n_vocab(&self) -> i32
pub fn n_vocab(&self) -> i32
The number of tokens the model was trained on.
This function returns the number of tokens the model was trained on. It is returned as a c_int for maximum
compatibility with the underlying llama-cpp library, though it can typically be cast to an i32 without issue.
§Example
use llama_cpp_4::model::LlamaModel;
let model = LlamaModel::load_from_file("path/to/model")?;
let n_vocab = model.n_vocab();Sourcepub fn vocab_type(&self) -> VocabType
pub fn vocab_type(&self) -> VocabType
The type of vocab the model was trained on.
This function returns the type of vocabulary used by the model, such as whether it is based on byte-pair encoding (BPE), word-level tokens, or another tokenization scheme.
§Panics
- This function will panic if
llama-cppemits a vocab type that is not recognized or is invalid for this library.
§Example
use llama_cpp_4::model::LlamaModel;
let model = LlamaModel::load_from_file("path/to/model")?;
let vocab_type = model.vocab_type();Sourcepub fn n_embd(&self) -> c_int
pub fn n_embd(&self) -> c_int
Returns the number of embedding dimensions for the model.
This function retrieves the number of embeddings (or embedding dimensions) used by the model. It is typically used for analyzing model architecture and setting up context parameters or other model configuration aspects.
§Panics
- This function may panic if the underlying
llama-cpplibrary returns an invalid embedding dimension value.
§Example
use llama_cpp_4::model::LlamaModel;
let model = LlamaModel::load_from_file("path/to/model")?;
let n_embd = model.n_embd();Sourcepub fn get_chat_template(
&self,
buf_size: usize,
) -> Result<String, ChatTemplateError>
pub fn get_chat_template( &self, buf_size: usize, ) -> Result<String, ChatTemplateError>
Get chat template from model.
§Errors
- If the model does not have a chat template, it will return an error.
- If the chat template is not a valid
CString, it will return an error.
§Example
use llama_cpp_4::model::LlamaModel;
let model = LlamaModel::load_from_file("path/to/model")?;
let chat_template = model.get_chat_template(1024)?;Sourcepub fn load_from_file(
_: &LlamaBackend,
path: impl AsRef<Path>,
params: &LlamaModelParams,
) -> Result<Self, LlamaModelLoadError>
pub fn load_from_file( _: &LlamaBackend, path: impl AsRef<Path>, params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>
Loads a model from a file.
This function loads a model from a specified file path and returns the corresponding LlamaModel instance.
§Errors
- If the path cannot be converted to a string or if the model file does not exist, it will return an error.
- If the model cannot be loaded (e.g., due to an invalid or corrupted model file), it will return a
LlamaModelLoadError.
§Example
use llama_cpp_4::model::LlamaModel;
use std::path::Path;
let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;Sourcepub fn load_from_splits(
_: &LlamaBackend,
paths: &[impl AsRef<Path>],
params: &LlamaModelParams,
) -> Result<Self, LlamaModelLoadError>
pub fn load_from_splits( _: &LlamaBackend, paths: &[impl AsRef<Path>], params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>
Load a model from multiple split files.
This function loads a model that has been split across multiple files. This is useful for very large models that exceed filesystem limitations or need to be distributed across multiple storage devices.
§Arguments
paths- A slice of paths to the split model filesparams- The model parameters
§Errors
Returns an error if:
- Any of the paths cannot be converted to a C string
- The model fails to load from the splits
- Any path doesn’t exist or isn’t accessible
§Example
use llama_cpp_4::model::{LlamaModel, params::LlamaModelParams};
use llama_cpp_4::llama_backend::LlamaBackend;
let backend = LlamaBackend::init()?;
let params = LlamaModelParams::default();
let paths = vec![
"model-00001-of-00003.gguf",
"model-00002-of-00003.gguf",
"model-00003-of-00003.gguf",
];
let model = LlamaModel::load_from_splits(&backend, &paths, ¶ms)?;Sourcepub fn lora_adapter_init(
&self,
path: impl AsRef<Path>,
) -> Result<LlamaLoraAdapter, LlamaLoraAdapterInitError>
pub fn lora_adapter_init( &self, path: impl AsRef<Path>, ) -> Result<LlamaLoraAdapter, LlamaLoraAdapterInitError>
Initializes a lora adapter from a file.
This function initializes a Lora adapter, which is a model extension used to adapt or fine-tune the existing model to a specific domain or task. The adapter file is typically in the form of a binary or serialized file that can be applied to the model for improved performance on specialized tasks.
§Errors
- If the adapter file path cannot be converted to a string or if the adapter cannot be initialized, it will return an error.
§Example
use llama_cpp_4::model::{LlamaModel, LlamaLoraAdapter};
use std::path::Path;
let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let adapter = model.lora_adapter_init("path/to/lora/adapter")?;Sourcepub fn new_context(
&self,
_: &LlamaBackend,
params: LlamaContextParams,
) -> Result<LlamaContext<'_>, LlamaContextLoadError>
pub fn new_context( &self, _: &LlamaBackend, params: LlamaContextParams, ) -> Result<LlamaContext<'_>, LlamaContextLoadError>
Create a new context from this model.
This function creates a new context for the model, which is used to manage and perform computations for inference, including token generation, embeddings, and other tasks that the model can perform. The context allows fine-grained control over model parameters for a specific task.
§Errors
- There are various potential failures such as invalid parameters or a failure to allocate the context. See
LlamaContextLoadErrorfor more detailed error descriptions.
§Example
use llama_cpp_4::model::{LlamaModel, LlamaContext};
use llama_cpp_4::LlamaContextParams;
let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let context = model.new_context(&LlamaBackend::init()?, LlamaContextParams::default())?;Sourcepub fn apply_chat_template(
&self,
tmpl: Option<&str>,
chat: &[LlamaChatMessage],
add_ass: bool,
) -> Result<String, ApplyChatTemplateError>
pub fn apply_chat_template( &self, tmpl: Option<&str>, chat: &[LlamaChatMessage], add_ass: bool, ) -> Result<String, ApplyChatTemplateError>
Apply the model’s chat template to a sequence of messages.
This function applies the model’s chat template to the provided chat messages, formatting them accordingly. The chat
template determines the structure or style of conversation between the system and user, such as token formatting,
role separation, and more. The template can be customized by providing an optional template string, or if None
is provided, the default template used by llama.cpp will be applied.
For more information on supported templates, visit: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
§Arguments
tmpl: An optional custom template string. IfNone, the default template will be used.chat: A vector ofLlamaChatMessageinstances, which represent the conversation between the system and user.add_ass: A boolean flag indicating whether additional system-specific instructions (like “assistant”) should be included.
§Errors
There are several possible points of failure when applying the chat template:
- Insufficient buffer size to hold the formatted chat (this will return
ApplyChatTemplateError::BuffSizeError). - If the template or messages cannot be processed properly, various errors from
ApplyChatTemplateErrormay occur.
§Example
use llama_cpp_4::model::{LlamaModel, LlamaChatMessage};
let model = LlamaModel::load_from_file("path/to/model", &LlamaModelParams::default())?;
let chat = vec![
LlamaChatMessage::new("user", "Hello!"),
LlamaChatMessage::new("assistant", "Hi! How can I assist you today?"),
];
let formatted_chat = model.apply_chat_template(None, chat, true)?;§Notes
The provided buffer is twice the length of the messages by default, which is recommended by the llama.cpp documentation.
§Panics
Panics if the buffer length exceeds i32::MAX.
Sourcepub fn split_path(path_prefix: &str, split_no: i32, split_count: i32) -> String
pub fn split_path(path_prefix: &str, split_no: i32, split_count: i32) -> String
Build a split GGUF file path for a specific chunk.
This utility function creates the standardized filename for a split model chunk
following the pattern: {prefix}-{split_no:05d}-of-{split_count:05d}.gguf
§Arguments
path_prefix- The base path and filename prefixsplit_no- The split number (1-indexed)split_count- The total number of splits
§Returns
Returns the formatted split path as a String
§Example
use llama_cpp_4::model::LlamaModel;
let path = LlamaModel::split_path("/models/llama", 2, 4);
assert_eq!(path, "/models/llama-00002-of-00004.gguf");§Panics
Panics if the path prefix contains a null byte.
Sourcepub fn split_prefix(
split_path: &str,
split_no: i32,
split_count: i32,
) -> Option<String>
pub fn split_prefix( split_path: &str, split_no: i32, split_count: i32, ) -> Option<String>
Extract the path prefix from a split filename.
This function extracts the base path prefix from a split model filename,
but only if the split_no and split_count match the pattern in the filename.
§Arguments
split_path- The full path to the split filesplit_no- The expected split numbersplit_count- The expected total number of splits
§Returns
Returns the path prefix if the pattern matches, or None if it doesn’t
§Example
use llama_cpp_4::model::LlamaModel;
let prefix = LlamaModel::split_prefix("/models/llama-00002-of-00004.gguf", 2, 4);
assert_eq!(prefix, Some("/models/llama".to_string()));§Panics
Panics if the split path contains a null byte.