Trait LLMLocalTrait

Source

pub trait LLMLocalTrait {
    // Required method
    fn config(&mut self) -> &mut LocalLLMConfig;

    // Provided methods
    fn error_on_config_issue(self, error_on_config_issue: bool) -> Self
       where Self: Sized { ... }
    fn use_gpu(self, use_gpu: bool) -> Self
       where Self: Sized { ... }
    fn cpu_only(self) -> Self
       where Self: Sized { ... }
    fn threads(self, threads: i16) -> Self
       where Self: Sized { ... }
    fn threads_batch(self, threads_batch: i16) -> Self
       where Self: Sized { ... }
    fn batch_size(self, batch_size: u64) -> Self
       where Self: Sized { ... }
    fn inference_ctx_size(self, inference_ctx_size: u64) -> Self
       where Self: Sized { ... }
    fn use_ram_gb(self, available_ram_gb: f32) -> Self
       where Self: Sized { ... }
    fn use_ram_percentage(self, use_ram_percentage: f32) -> Self
       where Self: Sized { ... }
    fn cuda_config(self, cuda_config: CudaConfig) -> Self
       where Self: Sized { ... }
}

Required Methods§

Source

fn config(&mut self) -> &mut LocalLLMConfig

Provided Methods§

Source

fn error_on_config_issue(self, error_on_config_issue: bool) -> Self
where Self: Sized,

If enabled, any issues with the configuration will result in an error. Otherwise, fallbacks will be used. Useful if you have a specific configuration in mind and want to ensure it is used.

§Arguments

error_on_config_issue - A boolean indicating whether to error on configuration issues.

§Default

Defaults to false.

Source

fn use_gpu(self, use_gpu: bool) -> Self
where Self: Sized,

Enables or disables GPU usage for inference.

§Arguments

use_gpu - A boolean indicating whether to use GPU (true) or not (false).

§Notes

On macOS, this setting affects Metal usage. On other platforms, it typically affects CUDA usage.

§Default

Defaults to true. If set to false, CPU inference will be used.

Source

fn cpu_only(self) -> Self
where Self: Sized,

Disables GPU usage and forces CPU-only inference.

§Notes

This is equivalent to calling use_gpu(false).

§Default

Defaults to false.

Source

fn threads(self, threads: i16) -> Self
where Self: Sized,

Sets the number of CPU threads to use for inference.

§Arguments

threads - The number of CPU threads to use.

§Notes

If loading purely in VRAM, this defaults to 1.

Source

fn threads_batch(self, threads_batch: i16) -> Self
where Self: Sized,

Sets the number of CPU threads to use for batching and prompt processing.

§Arguments

threads_batch - The number of CPU threads to use for batching and prompt processing.

§Default

If not set, defaults to a percentage of the total system threads.

Source

fn batch_size(self, batch_size: u64) -> Self
where Self: Sized,

Sets the batch size for inference.

§Arguments

batch_size - The batch size to use.

§Default

If not set, defaults to 512.

Source

fn inference_ctx_size(self, inference_ctx_size: u64) -> Self
where Self: Sized,

Sets the inference context size (maximum token limit for inference output).

§Arguments

inference_ctx_size - The maximum number of tokens the model can generate as output.

§Notes

This value is set when the model is loaded and cannot be changed after. If not set, a default value will be used.

Source

fn use_ram_gb(self, available_ram_gb: f32) -> Self
where Self: Sized,

Sets the amount of RAM to use for inference.

§Arguments

available_ram_gb - The amount of RAM to use, in gigabytes.

§Effects

On macOS: Affects all inference operations.
On Windows and Linux: Affects CPU inference only.

§Default Behavior

If this method is not called, the amount of RAM used will default to a percentage of the total system RAM. See use_ram_percentage for details on setting this percentage.

§Notes

The input value is converted to bytes internally. Precision may be affected for very large values due to floating-point to integer conversion.

Source

fn use_ram_percentage(self, use_ram_percentage: f32) -> Self
where Self: Sized,

Sets the percentage of total system RAM to use for inference.

§Arguments

use_ram_percentage - The percentage of total system RAM to use, expressed as a float between 0.0 and 1.0.

§Effects

On macOS: Affects all inference operations.
On Windows and Linux: Affects CPU inference only.

§Default Behavior

If neither this method nor use_ram_gb is called, the system will use 70% (0.7) of the available RAM by default for windows and linux or 90% (0.9) for macOS.

§Precedence

This setting is only used if use_ram_gb has not been called. If use_ram_gb has been set, that value takes precedence over the percentage set here.

§Notes

It’s recommended to set this value conservatively to avoid potential system instability or performance issues caused by memory pressure.

Source

fn cuda_config(self, cuda_config: CudaConfig) -> Self
where Self: Sized,

Sets the CUDA configuration for GPU inference.

§Arguments

cuda_config - The CUDA configuration to use.

§Notes

This method is only available on non-macOS platforms. If not set, CUDA devices will be automatically detected.

LLMLocalTrait

Trait LLMLocalTrait Copy item path

Required Methods§

fn config(&mut self) -> &mut LocalLLMConfig

Provided Methods§

fn error_on_config_issue(self, error_on_config_issue: bool) -> Selfwhere Self: Sized,

§Arguments

§Default

fn use_gpu(self, use_gpu: bool) -> Selfwhere Self: Sized,

§Arguments

§Notes

§Default

fn cpu_only(self) -> Selfwhere Self: Sized,

§Notes

§Default

fn threads(self, threads: i16) -> Selfwhere Self: Sized,

§Arguments

§Notes

fn threads_batch(self, threads_batch: i16) -> Selfwhere Self: Sized,

§Arguments

§Default

fn batch_size(self, batch_size: u64) -> Selfwhere Self: Sized,

§Arguments

§Default

fn inference_ctx_size(self, inference_ctx_size: u64) -> Selfwhere Self: Sized,

§Arguments

§Notes

fn use_ram_gb(self, available_ram_gb: f32) -> Selfwhere Self: Sized,

§Arguments

§Effects

§Default Behavior

§Notes

fn use_ram_percentage(self, use_ram_percentage: f32) -> Selfwhere Self: Sized,

§Arguments

§Effects

§Default Behavior

§Precedence

§Notes

fn cuda_config(self, cuda_config: CudaConfig) -> Selfwhere Self: Sized,

§Arguments

§Notes

Implementors§

Trait LLMLocalTrait

fn error_on_config_issue(self, error_on_config_issue: bool) -> Self
where Self: Sized,

fn use_gpu(self, use_gpu: bool) -> Self
where Self: Sized,

fn cpu_only(self) -> Self
where Self: Sized,

fn threads(self, threads: i16) -> Self
where Self: Sized,

fn threads_batch(self, threads_batch: i16) -> Self
where Self: Sized,

fn batch_size(self, batch_size: u64) -> Self
where Self: Sized,

fn inference_ctx_size(self, inference_ctx_size: u64) -> Self
where Self: Sized,

fn use_ram_gb(self, available_ram_gb: f32) -> Self
where Self: Sized,

fn use_ram_percentage(self, use_ram_percentage: f32) -> Self
where Self: Sized,

fn cuda_config(self, cuda_config: CudaConfig) -> Self
where Self: Sized,