Struct LlamaModelParams

Source

pub struct LlamaModelParams {
    pub params: llama_model_params,
    /* private fields */
}

Expand description

A safe wrapper around llama_model_params.

Fields§

§params: llama_model_params

The underlying llama_model_params from the C API.

Implementations§

Source §

impl LlamaModelParams

Source

pub fn add_cpu_moe_override( self: Pin<&mut Self>, ) -> Result<(), ModelParamsError>

Adds buffer type overrides to move all mixture-of-experts layers to CPU.

§Errors

Returns ModelParamsError if the internal override vector has no available slot, the slot is not empty, or the key contains invalid characters.

Source

pub fn add_cpu_buft_override( self: Pin<&mut Self>, key: &CStr, ) -> Result<(), ModelParamsError>

Appends a buffer type override to the model parameters, to move layers matching pattern to CPU. It must be pinned as this creates a self-referential struct.

§Errors

Returns ModelParamsError if the internal override vector has no available slot, the slot is not empty, or the key contains invalid characters.

Source §

impl LlamaModelParams

Source

pub const fn n_gpu_layers(&self) -> i32

Get the number of layers to offload to the GPU.

Source

pub const fn main_gpu(&self) -> i32

The GPU that is used for scratch and small tensors

Source

pub const fn vocab_only(&self) -> bool

only load the vocabulary, no weights

Source

pub const fn use_mmap(&self) -> bool

use mmap if possible

Source

pub const fn use_mlock(&self) -> bool

force system to keep model in RAM

Source

pub fn split_mode(&self) -> Result<LlamaSplitMode, LlamaSplitModeParseError>

get the split mode

§Errors

Returns LlamaSplitModeParseError if the unknown split mode is encountered.

Source

pub fn devices(&self) -> Vec<usize>

get the devices

Source

pub fn with_n_gpu_layers(self, n_gpu_layers: u32) -> Self

sets the number of gpu layers to offload to the GPU.

let params = LlamaModelParams::default();
let params = params.with_n_gpu_layers(1);
assert_eq!(params.n_gpu_layers(), 1);

Source

pub const fn with_main_gpu(self, main_gpu: i32) -> Self

sets the main GPU

To enable this option, you must set split_mode to LlamaSplitMode::None to enable single GPU mode.

Source

pub const fn with_vocab_only(self, vocab_only: bool) -> Self

sets vocab_only

Source

pub const fn with_use_mmap(self, use_mmap: bool) -> Self

sets use_mmap

§Examples

let params = LlamaModelParams::default().with_use_mmap(false);
assert!(!params.use_mmap());

Source

pub const fn no_alloc(&self) -> bool

Get no_alloc

Source

pub const fn with_no_alloc(self, no_alloc: bool) -> Self

Set no_alloc. When enabled, tensor data is not allocated. Incompatible with use_mmap, so enabling this also disables mmap.

§Examples

let params = LlamaModelParams::default().with_no_alloc(true);
assert!(params.no_alloc());
assert!(!params.use_mmap());

Source

pub const fn with_use_mlock(self, use_mlock: bool) -> Self

sets use_mlock

Source

pub fn with_split_mode(self, split_mode: LlamaSplitMode) -> Self

sets split_mode

Source

pub fn with_devices(self, devices: &[usize]) -> Result<Self, LlamaCppError>

sets devices

The devices are specified as indices that correspond to the ggml backend device indices.

The maximum number of devices is 16.

You don’t need to specify CPU or ACCEL devices.

§Errors

Returns LlamaCppError::BackendDeviceNotFound if any device index is invalid.

Source §

impl LlamaModelParams

Source

pub fn fit_params( self: Pin<&mut Self>, model_path: &CStr, context_params: &mut LlamaContextParams, margins: &mut [usize], n_ctx_min: u32, log_level: ggml_log_level, ) -> Result<FitResult, FitError>

Automatically fit model and context parameters to available device memory.

Wraps llama.cpp’s common_fit_params. Given a model path, available per-device memory margins, and a minimum context size, it fills in n_gpu_layers, tensor_split, and tensor_buft_overrides to fit the model to the available VRAM, and may reduce cparams.n_ctx if needed. On success the model and context params are updated in place.

§Requirements

Per the C API docstring, only parameters that still hold their default value are modified. In practice this means:

n_gpu_layers must be at its default (-1). Do not call with_n_gpu_layers before this.
No tensor_buft_overrides may be set. Do not call add_cpu_buft_override or add_cpu_moe_override before this.
cparams.n_ctx is only auto-selected if it is 0; otherwise it is left alone.

§Arguments

model_path — path to the GGUF model file as a C string.
context_params — context parameters; n_ctx may be modified (see above).
margins — memory margin per device in bytes. Must have at least crate::max_devices() elements.
n_ctx_min — minimum context size to preserve when reducing memory usage.
log_level — minimum log level for fitting output; lower levels go to the debug log.

§Thread safety

This function is not thread safe: the underlying C call mutates the global llama logger state.

§Errors

Returns one of the FitError variants matching the vendored wrapper’s status code.

Trait Implementations§

Source §

impl Debug for LlamaModelParams

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Default for LlamaModelParams

Default parameters for LlamaModel. (as defined in llama.cpp by llama_model_default_params)

use llama_cpp_bindings::model::split_mode::LlamaSplitMode;
let params = LlamaModelParams::default();
assert_eq!(params.n_gpu_layers(), -1, "n_gpu_layers should be -1");
assert_eq!(params.main_gpu(), 0, "main_gpu should be 0");
assert_eq!(params.vocab_only(), false, "vocab_only should be false");
assert_eq!(params.use_mmap(), true, "use_mmap should be true");
assert_eq!(params.use_mlock(), false, "use_mlock should be false");
assert_eq!(params.split_mode(), Ok(LlamaSplitMode::Layer), "split_mode should be LAYER");
assert_eq!(params.devices().len(), 0, "devices should be empty");

Source §

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

§

impl UnwindSafe for LlamaModelParams

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

impl<T> Pointable for T

Source §

const ALIGN: usize

The alignment of pointer.

Source §

type Init = T

The type for initializers.

Source §

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

Source §

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

Source §

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

Source §

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Struct LlamaModelParams Copy item path

Fields§

Implementations§

impl LlamaModelParams

pub const fn kv_overrides(&self) -> KvOverrides<'_>

§Examples

pub fn append_kv_override( self: Pin<&mut Self>, key: &CStr, value: ParamOverrideValue, ) -> Result<(), ModelParamsError>

§Errors

§Examples

impl LlamaModelParams

pub fn add_cpu_moe_override( self: Pin<&mut Self>, ) -> Result<(), ModelParamsError>

§Errors

pub fn add_cpu_buft_override( self: Pin<&mut Self>, key: &CStr, ) -> Result<(), ModelParamsError>

§Errors

impl LlamaModelParams

pub const fn n_gpu_layers(&self) -> i32

pub const fn main_gpu(&self) -> i32

pub const fn vocab_only(&self) -> bool

pub const fn use_mmap(&self) -> bool

pub const fn use_mlock(&self) -> bool

pub fn split_mode(&self) -> Result<LlamaSplitMode, LlamaSplitModeParseError>

§Errors

pub fn devices(&self) -> Vec<usize>

pub fn with_n_gpu_layers(self, n_gpu_layers: u32) -> Self

pub const fn with_main_gpu(self, main_gpu: i32) -> Self

pub const fn with_vocab_only(self, vocab_only: bool) -> Self

pub const fn with_use_mmap(self, use_mmap: bool) -> Self

§Examples

pub const fn no_alloc(&self) -> bool

pub const fn with_no_alloc(self, no_alloc: bool) -> Self

§Examples

pub const fn with_use_mlock(self, use_mlock: bool) -> Self

pub fn with_split_mode(self, split_mode: LlamaSplitMode) -> Self

pub fn with_devices(self, devices: &[usize]) -> Result<Self, LlamaCppError>

§Errors

impl LlamaModelParams

pub fn fit_params( self: Pin<&mut Self>, model_path: &CStr, context_params: &mut LlamaContextParams, margins: &mut [usize], n_ctx_min: u32, log_level: ggml_log_level, ) -> Result<FitResult, FitError>

§Requirements

§Arguments

§Thread safety

§Errors

Trait Implementations§

impl Debug for LlamaModelParams

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for LlamaModelParams

fn default() -> Self

Auto Trait Implementations§

impl Freeze for LlamaModelParams

impl RefUnwindSafe for LlamaModelParams

impl !Send for LlamaModelParams

impl !Sync for LlamaModelParams

impl Unpin for LlamaModelParams

impl UnsafeUnpin for LlamaModelParams

impl UnwindSafe for LlamaModelParams

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

Struct LlamaModelParams

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,