pub struct LlamaModel { /* private fields */ }Expand description
A safe wrapper around llama_model.
Implementations§
Source§impl LlamaModel
impl LlamaModel
Sourcepub fn n_ctx_train(&self) -> u32
pub fn n_ctx_train(&self) -> u32
get the number of tokens the model was trained on
§Panics
If the number of tokens the model was trained on does not fit into an u32. This should be impossible on most
platforms due to llama.cpp returning a c_int (i32 on most platforms) which is almost certainly positive.
Sourcepub fn tokens(
&self,
decode_special: bool,
) -> impl Iterator<Item = (LlamaToken, Result<String, TokenToStringError>)> + '_
pub fn tokens( &self, decode_special: bool, ) -> impl Iterator<Item = (LlamaToken, Result<String, TokenToStringError>)> + '_
Get all tokens in the model.
Sourcepub fn token_bos(&self) -> LlamaToken
pub fn token_bos(&self) -> LlamaToken
Get the beginning of stream token.
Sourcepub fn token_eos(&self) -> LlamaToken
pub fn token_eos(&self) -> LlamaToken
Get the end of stream token.
Sourcepub fn token_nl(&self) -> LlamaToken
pub fn token_nl(&self) -> LlamaToken
Get the newline token.
Sourcepub fn is_eog_token(&self, token: LlamaToken) -> bool
pub fn is_eog_token(&self, token: LlamaToken) -> bool
Check if a token represents the end of generation (end of turn, end of sequence, etc.)
Sourcepub fn decode_start_token(&self) -> LlamaToken
pub fn decode_start_token(&self) -> LlamaToken
Get the decoder start token.
Sourcepub fn token_sep(&self) -> LlamaToken
pub fn token_sep(&self) -> LlamaToken
Get the separator token (SEP).
Sourcepub fn token_to_str(
&self,
token: LlamaToken,
special: Special,
) -> Result<String, TokenToStringError>
👎Deprecated since 0.1.0: Use token_to_piece instead
pub fn token_to_str( &self, token: LlamaToken, special: Special, ) -> Result<String, TokenToStringError>
token_to_piece insteadSourcepub fn token_to_bytes(
&self,
token: LlamaToken,
special: Special,
) -> Result<Vec<u8>, TokenToStringError>
👎Deprecated since 0.1.0: Use token_to_piece_bytes instead
pub fn token_to_bytes( &self, token: LlamaToken, special: Special, ) -> Result<Vec<u8>, TokenToStringError>
token_to_piece_bytes insteadConvert single token to bytes.
§Errors
See TokenToStringError for more information.
§Panics
If a TokenToStringError::InsufficientBufferSpace error returned by
Self::token_to_bytes_with_size contains a positive nonzero value. This should never
happen.
Sourcepub fn tokens_to_str(
&self,
tokens: &[LlamaToken],
special: Special,
) -> Result<String, TokenToStringError>
👎Deprecated since 0.1.0: Use token_to_piece for each token individually instead
pub fn tokens_to_str( &self, tokens: &[LlamaToken], special: Special, ) -> Result<String, TokenToStringError>
token_to_piece for each token individually insteadSourcepub fn str_to_token(
&self,
str: &str,
add_bos: AddBos,
) -> Result<Vec<LlamaToken>, StringToTokenError>
pub fn str_to_token( &self, str: &str, add_bos: AddBos, ) -> Result<Vec<LlamaToken>, StringToTokenError>
Convert a string to a Vector of tokens.
§Errors
- if
strcontains a null byte.
§Panics
- if there is more than
usize::MAXLlamaTokens instr.
use llama_cpp_2::model::LlamaModel;
use std::path::Path;
use llama_cpp_2::model::AddBos;
let backend = llama_cpp_2::llama_backend::LlamaBackend::init()?;
let model = LlamaModel::load_from_file(&backend, Path::new("path/to/model"), &Default::default())?;
let tokens = model.str_to_token("Hello, World!", AddBos::Always)?;Sourcepub fn token_attr(&self, LlamaToken: LlamaToken) -> LlamaTokenAttrs
pub fn token_attr(&self, LlamaToken: LlamaToken) -> LlamaTokenAttrs
Sourcepub fn token_to_piece(
&self,
token: LlamaToken,
decoder: &mut Decoder,
special: bool,
lstrip: Option<NonZeroU16>,
) -> Result<String, TokenToStringError>
pub fn token_to_piece( &self, token: LlamaToken, decoder: &mut Decoder, special: bool, lstrip: Option<NonZeroU16>, ) -> Result<String, TokenToStringError>
Convert a token to a string using the underlying llama.cpp llama_token_to_piece function.
This is the new default function for token decoding and provides direct access to the llama.cpp token decoding functionality without any special logic or filtering.
Decoding raw string requires using an decoder, tokens from language models may not always map to full charakters depending on the encoding so stateful decoding is required, otherwise partial strings may be lost! Invalid characters are mapped to REPLACEMENT CHARACTER making the method safe to use even if the model inherently produces garbage.
§Errors
- if the token type is unknown
§Panics
- if the returned size from llama-cpp does not fit into a
usize. (this should never happen)
Sourcepub fn token_to_piece_bytes(
&self,
token: LlamaToken,
buffer_size: usize,
special: bool,
lstrip: Option<NonZeroU16>,
) -> Result<Vec<u8>, TokenToStringError>
pub fn token_to_piece_bytes( &self, token: LlamaToken, buffer_size: usize, special: bool, lstrip: Option<NonZeroU16>, ) -> Result<Vec<u8>, TokenToStringError>
Raw token decoding to bytes, use if you want to handle the decoding model output yourself
Convert a token to bytes using the underlying llama.cpp llama_token_to_piece function. This is mostly
a thin wrapper around llama_token_to_piece function, that handles rust <-> c type conversions while
letting the caller handle errors. For a safer inteface returing rust strings directly use token_to_piece instead!
§Errors
- if the token type is unknown
- the resultant token is larger than
buffer_size.
§Panics
Sourcepub fn token_to_str_with_size(
&self,
token: LlamaToken,
buffer_size: usize,
special: Special,
) -> Result<String, TokenToStringError>
👎Deprecated since 0.1.0: Use token_to_piece instead
pub fn token_to_str_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, ) -> Result<String, TokenToStringError>
token_to_piece insteadConvert a token to a string with a specified buffer size.
Generally you should use LlamaModel::token_to_str as it is able to decode tokens with
any length.
§Errors
- if the token type is unknown
- the resultant token is larger than
buffer_size. - the string returend by llama-cpp is not valid utf8.
§Panics
Sourcepub fn token_to_bytes_with_size(
&self,
token: LlamaToken,
buffer_size: usize,
special: Special,
lstrip: Option<NonZeroU16>,
) -> Result<Vec<u8>, TokenToStringError>
👎Deprecated since 0.1.0: Use token_to_piece_bytes instead
pub fn token_to_bytes_with_size( &self, token: LlamaToken, buffer_size: usize, special: Special, lstrip: Option<NonZeroU16>, ) -> Result<Vec<u8>, TokenToStringError>
token_to_piece_bytes insteadConvert a token to bytes with a specified buffer size.
Generally you should use LlamaModel::token_to_bytes as it is able to handle tokens of
any length.
§Errors
- if the token type is unknown
- the resultant token is larger than
buffer_size.
§Panics
Sourcepub fn n_vocab(&self) -> i32
pub fn n_vocab(&self) -> i32
The number of tokens the model was trained on.
This returns a c_int for maximum compatibility. Most of the time it can be cast to an i32
without issue.
Sourcepub fn vocab_type(&self) -> VocabType
pub fn vocab_type(&self) -> VocabType
The type of vocab the model was trained on.
§Panics
If llama-cpp emits a vocab type that is not known to this library.
Sourcepub fn n_embd(&self) -> c_int
pub fn n_embd(&self) -> c_int
This returns a c_int for maximum compatibility. Most of the time it can be cast to an i32
without issue.
Sourcepub fn is_recurrent(&self) -> bool
pub fn is_recurrent(&self) -> bool
Returns whether the model is a recurrent network (Mamba, RWKV, etc)
Sourcepub fn meta_val_str(&self, key: &str) -> Result<String, MetaValError>
pub fn meta_val_str(&self, key: &str) -> Result<String, MetaValError>
Get metadata value as a string by key name
Sourcepub fn meta_count(&self) -> i32
pub fn meta_count(&self) -> i32
Get the number of metadata key/value pairs
Sourcepub fn meta_key_by_index(&self, index: i32) -> Result<String, MetaValError>
pub fn meta_key_by_index(&self, index: i32) -> Result<String, MetaValError>
Get metadata key name by index
Sourcepub fn meta_val_str_by_index(&self, index: i32) -> Result<String, MetaValError>
pub fn meta_val_str_by_index(&self, index: i32) -> Result<String, MetaValError>
Get metadata value as a string by index
Sourcepub fn chat_template(
&self,
name: Option<&str>,
) -> Result<LlamaChatTemplate, ChatTemplateError>
pub fn chat_template( &self, name: Option<&str>, ) -> Result<LlamaChatTemplate, ChatTemplateError>
Get chat template from model by name. If the name parameter is None, the default chat template will be returned.
You supply this into Self::apply_chat_template to get back a string with the appropriate template
substitution applied to convert a list of messages into a prompt the LLM can use to complete
the chat.
You could also use an external jinja parser, like minijinja, to parse jinja templates not supported by the llama.cpp template engine.
§Errors
- If the model has no chat template by that name
- If the chat template is not a valid
CString.
Sourcepub fn load_from_file(
_: &LlamaBackend,
path: impl AsRef<Path>,
params: &LlamaModelParams,
) -> Result<Self, LlamaModelLoadError>
pub fn load_from_file( _: &LlamaBackend, path: impl AsRef<Path>, params: &LlamaModelParams, ) -> Result<Self, LlamaModelLoadError>
Sourcepub fn lora_adapter_init(
&self,
path: impl AsRef<Path>,
) -> Result<LlamaLoraAdapter, LlamaLoraAdapterInitError>
pub fn lora_adapter_init( &self, path: impl AsRef<Path>, ) -> Result<LlamaLoraAdapter, LlamaLoraAdapterInitError>
Sourcepub fn new_context<'a>(
&'a self,
_: &LlamaBackend,
params: LlamaContextParams,
) -> Result<LlamaContext<'a>, LlamaContextLoadError>
pub fn new_context<'a>( &'a self, _: &LlamaBackend, params: LlamaContextParams, ) -> Result<LlamaContext<'a>, LlamaContextLoadError>
Create a new context from this model.
§Errors
There is many ways this can fail. See LlamaContextLoadError for more information.
Sourcepub fn apply_chat_template(
&self,
tmpl: &LlamaChatTemplate,
chat: &[LlamaChatMessage],
add_ass: bool,
) -> Result<String, ApplyChatTemplateError>
pub fn apply_chat_template( &self, tmpl: &LlamaChatTemplate, chat: &[LlamaChatMessage], add_ass: bool, ) -> Result<String, ApplyChatTemplateError>
Apply the models chat template to some messages. See https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
Unlike the llama.cpp apply_chat_template which just randomly uses the ChatML template when given
a null pointer for the template, this requires an explicit template to be specified. If you want to
use “chatml”, then just do LlamaChatTemplate::new("chatml") or any other model name or template
string.
Use Self::chat_template to retrieve the template baked into the model (this is the preferred
mechanism as using the wrong chat template can result in really unexpected responses from the LLM).
You probably want to set add_ass to true so that the generated template string ends with a the
opening tag of the assistant. If you fail to leave a hanging chat tag, the model will likely generate
one into the output and the output may also have unexpected output aside from that.
§Errors
There are many ways this can fail. See ApplyChatTemplateError for more information.
Sourcepub fn apply_chat_template_with_tools_oaicompat(
&self,
tmpl: &LlamaChatTemplate,
messages: &[LlamaChatMessage],
tools_json: Option<&str>,
json_schema: Option<&str>,
add_generation_prompt: bool,
) -> Result<ChatTemplateResult, ApplyChatTemplateError>
pub fn apply_chat_template_with_tools_oaicompat( &self, tmpl: &LlamaChatTemplate, messages: &[LlamaChatMessage], tools_json: Option<&str>, json_schema: Option<&str>, add_generation_prompt: bool, ) -> Result<ChatTemplateResult, ApplyChatTemplateError>
Apply the models chat template to some messages and return an optional tool grammar.
tools_json should be an OpenAI-compatible tool definition JSON array string.
json_schema should be a JSON schema string.
Sourcepub fn apply_chat_template_oaicompat(
&self,
tmpl: &LlamaChatTemplate,
params: &OpenAIChatTemplateParams<'_>,
) -> Result<ChatTemplateResult, ApplyChatTemplateError>
pub fn apply_chat_template_oaicompat( &self, tmpl: &LlamaChatTemplate, params: &OpenAIChatTemplateParams<'_>, ) -> Result<ChatTemplateResult, ApplyChatTemplateError>
Apply the model chat template using OpenAI-compatible JSON messages.