Skip to main content

LlamaModel

Struct LlamaModel 

Source
#[repr(transparent)]
pub struct LlamaModel { pub model: NonNull<llama_model>, }
Expand description

A safe wrapper around llama_model.

Fields§

§model: NonNull<llama_model>

Raw pointer to the underlying llama_model.

Implementations§

Source§

impl LlamaModel

Source

pub fn vocab_ptr(&self) -> *const llama_vocab

Returns a raw pointer to the model’s vocabulary.

Source

pub fn n_ctx_train(&self) -> Result<u32, TryFromIntError>

Get the number of tokens the model was trained on.

§Errors

Returns an error if the value returned by llama.cpp does not fit into a u32.

Source

pub fn tokens( &self, decode_special: bool, ) -> impl Iterator<Item = (LlamaToken, Result<String, TokenToStringError>)> + '_

Get all tokens in the model.

Source

pub fn token_bos(&self) -> LlamaToken

Get the beginning of stream token.

Source

pub fn token_eos(&self) -> LlamaToken

Get the end of stream token.

Source

pub fn token_nl(&self) -> LlamaToken

Get the newline token.

Source

pub fn is_eog_token(&self, token: LlamaToken) -> bool

Check if a token represents the end of generation (end of turn, end of sequence, etc.)

Source

pub fn decode_start_token(&self) -> LlamaToken

Get the decoder start token.

Source

pub fn token_sep(&self) -> LlamaToken

Get the separator token (SEP).

Source

pub fn str_to_token( &self, str: &str, add_bos: AddBos, ) -> Result<Vec<LlamaToken>, StringToTokenError>

Convert a string to a Vector of tokens.

§Errors
  • if str contains a null byte
  • if an integer conversion fails during tokenization
use llama_cpp_bindings::model::LlamaModel;

use std::path::Path;
use llama_cpp_bindings::model::AddBos;
let backend = llama_cpp_bindings::llama_backend::LlamaBackend::init()?;
let model = LlamaModel::load_from_file(&backend, Path::new("path/to/model"), &Default::default())?;
let tokens = model.str_to_token("Hello, World!", AddBos::Always)?;
Source

pub fn token_attr( &self, LlamaToken: LlamaToken, ) -> Result<LlamaTokenAttrs, LlamaTokenTypeFromIntError>

Get the type of a token.

§Errors

Returns an error if the token type is not known to this library.

Source

pub fn token_to_piece( &self, token: LlamaToken, decoder: &mut Decoder, special: bool, lstrip: Option<NonZeroU16>, ) -> Result<String, TokenToStringError>

Convert a token to a string using the underlying llama.cpp llama_token_to_piece function.

This is the new default function for token decoding and provides direct access to the llama.cpp token decoding functionality without any special logic or filtering.

Decoding raw string requires using an decoder, tokens from language models may not always map to full characters depending on the encoding so stateful decoding is required, otherwise partial strings may be lost! Invalid characters are mapped to REPLACEMENT CHARACTER making the method safe to use even if the model inherently produces garbage.

§Errors
  • if the token type is unknown

  • if the returned size from llama.cpp does not fit into a usize

Source

pub fn token_to_piece_bytes( &self, token: LlamaToken, buffer_size: usize, special: bool, lstrip: Option<NonZeroU16>, ) -> Result<Vec<u8>, TokenToStringError>

Raw token decoding to bytes, use if you want to handle the decoding model output yourself

Convert a token to bytes using the underlying llama.cpp llama_token_to_piece function. This is mostly a thin wrapper around llama_token_to_piece function, that handles rust <-> c type conversions while letting the caller handle errors. For a safer interface returning rust strings directly use token_to_piece instead!

§Errors
  • if the token type is unknown
  • the resultant token is larger than buffer_size.
  • if an integer conversion fails
Source

pub fn n_vocab(&self) -> i32

The number of tokens the model was trained on.

This returns a c_int for maximum compatibility. Most of the time it can be cast to an i32 without issue.

Source

pub fn vocab_type(&self) -> Result<VocabType, LlamaTokenTypeFromIntError>

The type of vocab the model was trained on.

§Errors

Returns an error if llama.cpp emits a vocab type that is not known to this library.

Source

pub fn n_embd(&self) -> c_int

This returns a c_int for maximum compatibility. Most of the time it can be cast to an i32 without issue.

Source

pub fn size(&self) -> u64

Returns the total size of all the tensors in the model in bytes.

Source

pub fn n_params(&self) -> u64

Returns the number of parameters in the model.

Source

pub fn is_recurrent(&self) -> bool

Returns whether the model is a recurrent network (Mamba, RWKV, etc)

Source

pub fn n_layer(&self) -> Result<u32, TryFromIntError>

Returns the number of layers within the model.

§Errors

Returns an error if the layer count returned by llama.cpp does not fit into a u32.

Source

pub fn n_head(&self) -> Result<u32, TryFromIntError>

Returns the number of attention heads within the model.

§Errors

Returns an error if the head count returned by llama.cpp does not fit into a u32.

Source

pub fn n_head_kv(&self) -> Result<u32, TryFromIntError>

Returns the number of KV attention heads.

§Errors

Returns an error if the KV head count returned by llama.cpp does not fit into a u32.

Source

pub fn is_hybrid(&self) -> bool

Returns whether the model is a hybrid network (Jamba, Granite, Qwen3xx, etc.)

Hybrid models have both attention layers and recurrent/SSM layers.

Source

pub fn meta_val_str(&self, key: &str) -> Result<String, MetaValError>

Get metadata value as a string by key name

§Errors

Returns an error if the key is not found or the value is not valid UTF-8.

Source

pub fn meta_count(&self) -> i32

Get the number of metadata key/value pairs

Source

pub fn meta_key_by_index(&self, index: i32) -> Result<String, MetaValError>

Get metadata key name by index

§Errors

Returns an error if the index is out of range or the key is not valid UTF-8.

Source

pub fn meta_val_str_by_index(&self, index: i32) -> Result<String, MetaValError>

Get metadata value as a string by index

§Errors

Returns an error if the index is out of range or the value is not valid UTF-8.

Source

pub fn rope_type(&self) -> Option<RopeType>

Returns the rope type of the model.

Source

pub fn chat_template( &self, name: Option<&str>, ) -> Result<LlamaChatTemplate, ChatTemplateError>

Get chat template from model by name. If the name parameter is None, the default chat template will be returned.

You supply this into Self::apply_chat_template to get back a string with the appropriate template substitution applied to convert a list of messages into a prompt the LLM can use to complete the chat.

You could also use an external jinja parser, like minijinja, to parse jinja templates not supported by the llama.cpp template engine.

§Errors
  • If the model has no chat template by that name
§Panics

Panics if the C-returned chat template string contains interior null bytes (should never happen with valid model data).

Source

pub fn load_from_file( _: &LlamaBackend, path: impl AsRef<Path>, params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>

Loads a model from a file.

§Errors

See LlamaModelLoadError for more information.

§Panics

Panics if a valid UTF-8 path somehow contains interior null bytes (should never happen).

Source

pub fn lora_adapter_init( &self, path: impl AsRef<Path>, ) -> Result<LlamaLoraAdapter, LlamaLoraAdapterInitError>

Initializes a lora adapter from a file.

§Errors

See LlamaLoraAdapterInitError for more information.

Source

pub fn new_context<'model>( &'model self, _: &LlamaBackend, params: LlamaContextParams, ) -> Result<LlamaContext<'model>, LlamaContextLoadError>

Create a new context from this model.

§Errors

There is many ways this can fail. See LlamaContextLoadError for more information.

Source

pub fn apply_chat_template( &self, tmpl: &LlamaChatTemplate, chat: &[LlamaChatMessage], add_ass: bool, ) -> Result<String, ApplyChatTemplateError>

Apply the models chat template to some messages. See https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

Unlike the llama.cpp apply_chat_template which just randomly uses the ChatML template when given a null pointer for the template, this requires an explicit template to be specified. If you want to use “chatml”, then just do LlamaChatTemplate::new("chatml") or any other model name or template string.

Use Self::chat_template to retrieve the template baked into the model (this is the preferred mechanism as using the wrong chat template can result in really unexpected responses from the LLM).

You probably want to set add_ass to true so that the generated template string ends with a the opening tag of the assistant. If you fail to leave a hanging chat tag, the model will likely generate one into the output and the output may also have unexpected output aside from that.

§Errors

There are many ways this can fail. See ApplyChatTemplateError for more information.

Source

pub fn apply_chat_template_with_tools_oaicompat( &self, tmpl: &LlamaChatTemplate, messages: &[LlamaChatMessage], tools_json: Option<&str>, json_schema: Option<&str>, add_generation_prompt: bool, ) -> Result<ChatTemplateResult, ApplyChatTemplateError>

Apply the models chat template to some messages and return an optional tool grammar. tools_json should be an OpenAI-compatible tool definition JSON array string. json_schema should be a JSON schema string.

§Errors

Returns an error if the FFI call fails or the result contains invalid data.

Source

pub fn apply_chat_template_oaicompat( &self, tmpl: &LlamaChatTemplate, params: &OpenAIChatTemplateParams<'_>, ) -> Result<ChatTemplateResult, ApplyChatTemplateError>

Apply the model chat template using OpenAI-compatible JSON messages.

§Errors

Returns an error if the FFI call fails or the result contains invalid data.

Trait Implementations§

Source§

impl Debug for LlamaModel

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Drop for LlamaModel

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

impl Send for LlamaModel

Source§

impl Sync for LlamaModel

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more