pub trait LLMLocalTrait {
// Required method
fn config(&mut self) -> &mut LocalLLMConfig;
// Provided methods
fn error_on_config_issue(self, error_on_config_issue: bool) -> Self
where Self: Sized { ... }
fn use_gpu(self, use_gpu: bool) -> Self
where Self: Sized { ... }
fn cpu_only(self) -> Self
where Self: Sized { ... }
fn threads(self, threads: i16) -> Self
where Self: Sized { ... }
fn threads_batch(self, threads_batch: i16) -> Self
where Self: Sized { ... }
fn batch_size(self, batch_size: u64) -> Self
where Self: Sized { ... }
fn inference_ctx_size(self, inference_ctx_size: u64) -> Self
where Self: Sized { ... }
fn use_ram_gb(self, available_ram_gb: f32) -> Self
where Self: Sized { ... }
fn use_ram_percentage(self, use_ram_percentage: f32) -> Self
where Self: Sized { ... }
fn cuda_config(self, cuda_config: CudaConfig) -> Self
where Self: Sized { ... }
}Required Methods§
fn config(&mut self) -> &mut LocalLLMConfig
Provided Methods§
Sourcefn error_on_config_issue(self, error_on_config_issue: bool) -> Selfwhere
Self: Sized,
fn error_on_config_issue(self, error_on_config_issue: bool) -> Selfwhere
Self: Sized,
If enabled, any issues with the configuration will result in an error. Otherwise, fallbacks will be used. Useful if you have a specific configuration in mind and want to ensure it is used.
§Arguments
error_on_config_issue- A boolean indicating whether to error on configuration issues.
§Default
Defaults to false.
Sourcefn use_gpu(self, use_gpu: bool) -> Selfwhere
Self: Sized,
fn use_gpu(self, use_gpu: bool) -> Selfwhere
Self: Sized,
Enables or disables GPU usage for inference.
§Arguments
use_gpu- A boolean indicating whether to use GPU (true) or not (false).
§Notes
On macOS, this setting affects Metal usage. On other platforms, it typically affects CUDA usage.
§Default
Defaults to true. If set to false, CPU inference will be used.
Sourcefn threads_batch(self, threads_batch: i16) -> Selfwhere
Self: Sized,
fn threads_batch(self, threads_batch: i16) -> Selfwhere
Self: Sized,
Sourcefn batch_size(self, batch_size: u64) -> Selfwhere
Self: Sized,
fn batch_size(self, batch_size: u64) -> Selfwhere
Self: Sized,
Sourcefn inference_ctx_size(self, inference_ctx_size: u64) -> Selfwhere
Self: Sized,
fn inference_ctx_size(self, inference_ctx_size: u64) -> Selfwhere
Self: Sized,
Sourcefn use_ram_gb(self, available_ram_gb: f32) -> Selfwhere
Self: Sized,
fn use_ram_gb(self, available_ram_gb: f32) -> Selfwhere
Self: Sized,
Sets the amount of RAM to use for inference.
§Arguments
available_ram_gb- The amount of RAM to use, in gigabytes.
§Effects
- On macOS: Affects all inference operations.
- On Windows and Linux: Affects CPU inference only.
§Default Behavior
If this method is not called, the amount of RAM used will default to a percentage
of the total system RAM. See use_ram_percentage for details on setting this percentage.
§Notes
The input value is converted to bytes internally. Precision may be affected for very large values due to floating-point to integer conversion.
Sourcefn use_ram_percentage(self, use_ram_percentage: f32) -> Selfwhere
Self: Sized,
fn use_ram_percentage(self, use_ram_percentage: f32) -> Selfwhere
Self: Sized,
Sets the percentage of total system RAM to use for inference.
§Arguments
use_ram_percentage- The percentage of total system RAM to use, expressed as a float between 0.0 and 1.0.
§Effects
- On macOS: Affects all inference operations.
- On Windows and Linux: Affects CPU inference only.
§Default Behavior
If neither this method nor use_ram_gb is called, the system will use 70% (0.7) of
the available RAM by default for windows and linux or 90% (0.9) for macOS.
§Precedence
This setting is only used if use_ram_gb has not been called. If use_ram_gb has been
set, that value takes precedence over the percentage set here.
§Notes
It’s recommended to set this value conservatively to avoid potential system instability or performance issues caused by memory pressure.