LLMLocalTrait

Trait LLMLocalTrait 

Source
pub trait LLMLocalTrait {
    // Required method
    fn config(&mut self) -> &mut LocalLLMConfig;

    // Provided methods
    fn error_on_config_issue(self, error_on_config_issue: bool) -> Self
       where Self: Sized { ... }
    fn use_gpu(self, use_gpu: bool) -> Self
       where Self: Sized { ... }
    fn cpu_only(self) -> Self
       where Self: Sized { ... }
    fn threads(self, threads: i16) -> Self
       where Self: Sized { ... }
    fn threads_batch(self, threads_batch: i16) -> Self
       where Self: Sized { ... }
    fn batch_size(self, batch_size: u64) -> Self
       where Self: Sized { ... }
    fn inference_ctx_size(self, inference_ctx_size: u64) -> Self
       where Self: Sized { ... }
    fn use_ram_gb(self, available_ram_gb: f32) -> Self
       where Self: Sized { ... }
    fn use_ram_percentage(self, use_ram_percentage: f32) -> Self
       where Self: Sized { ... }
    fn cuda_config(self, cuda_config: CudaConfig) -> Self
       where Self: Sized { ... }
}

Required Methods§

Source

fn config(&mut self) -> &mut LocalLLMConfig

Provided Methods§

Source

fn error_on_config_issue(self, error_on_config_issue: bool) -> Self
where Self: Sized,

If enabled, any issues with the configuration will result in an error. Otherwise, fallbacks will be used. Useful if you have a specific configuration in mind and want to ensure it is used.

§Arguments
  • error_on_config_issue - A boolean indicating whether to error on configuration issues.
§Default

Defaults to false.

Source

fn use_gpu(self, use_gpu: bool) -> Self
where Self: Sized,

Enables or disables GPU usage for inference.

§Arguments
  • use_gpu - A boolean indicating whether to use GPU (true) or not (false).
§Notes

On macOS, this setting affects Metal usage. On other platforms, it typically affects CUDA usage.

§Default

Defaults to true. If set to false, CPU inference will be used.

Source

fn cpu_only(self) -> Self
where Self: Sized,

Disables GPU usage and forces CPU-only inference.

§Notes

This is equivalent to calling use_gpu(false).

§Default

Defaults to false.

Source

fn threads(self, threads: i16) -> Self
where Self: Sized,

Sets the number of CPU threads to use for inference.

§Arguments
  • threads - The number of CPU threads to use.
§Notes

If loading purely in VRAM, this defaults to 1.

Source

fn threads_batch(self, threads_batch: i16) -> Self
where Self: Sized,

Sets the number of CPU threads to use for batching and prompt processing.

§Arguments
  • threads_batch - The number of CPU threads to use for batching and prompt processing.
§Default

If not set, defaults to a percentage of the total system threads.

Source

fn batch_size(self, batch_size: u64) -> Self
where Self: Sized,

Sets the batch size for inference.

§Arguments
  • batch_size - The batch size to use.
§Default

If not set, defaults to 512.

Source

fn inference_ctx_size(self, inference_ctx_size: u64) -> Self
where Self: Sized,

Sets the inference context size (maximum token limit for inference output).

§Arguments
  • inference_ctx_size - The maximum number of tokens the model can generate as output.
§Notes

This value is set when the model is loaded and cannot be changed after. If not set, a default value will be used.

Source

fn use_ram_gb(self, available_ram_gb: f32) -> Self
where Self: Sized,

Sets the amount of RAM to use for inference.

§Arguments
  • available_ram_gb - The amount of RAM to use, in gigabytes.
§Effects
  • On macOS: Affects all inference operations.
  • On Windows and Linux: Affects CPU inference only.
§Default Behavior

If this method is not called, the amount of RAM used will default to a percentage of the total system RAM. See use_ram_percentage for details on setting this percentage.

§Notes

The input value is converted to bytes internally. Precision may be affected for very large values due to floating-point to integer conversion.

Source

fn use_ram_percentage(self, use_ram_percentage: f32) -> Self
where Self: Sized,

Sets the percentage of total system RAM to use for inference.

§Arguments
  • use_ram_percentage - The percentage of total system RAM to use, expressed as a float between 0.0 and 1.0.
§Effects
  • On macOS: Affects all inference operations.
  • On Windows and Linux: Affects CPU inference only.
§Default Behavior

If neither this method nor use_ram_gb is called, the system will use 70% (0.7) of the available RAM by default for windows and linux or 90% (0.9) for macOS.

§Precedence

This setting is only used if use_ram_gb has not been called. If use_ram_gb has been set, that value takes precedence over the percentage set here.

§Notes

It’s recommended to set this value conservatively to avoid potential system instability or performance issues caused by memory pressure.

Source

fn cuda_config(self, cuda_config: CudaConfig) -> Self
where Self: Sized,

Sets the CUDA configuration for GPU inference.

§Arguments
  • cuda_config - The CUDA configuration to use.
§Notes

This method is only available on non-macOS platforms. If not set, CUDA devices will be automatically detected.

Implementors§