Crate llm_base

Expand description

This crate provides a unified interface for loading and using Large Language Models (LLMs).

This is the base crate that implementors can use to implement their own LLMs.

As a user, you probably want to use the llm crate instead.

Re-exports§

InferenceParameters: The parameters for text generation.
InferenceRequest: Settings specific to InferenceSession::infer.
InferenceSession: An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.
InferenceSessionConfig: Configuration for an inference session.
InferenceSnapshot: A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
InferenceStats: Statistics about the inference process.
Loader: A GGML format loader for LLMs.
Mmap: A handle to an immutable memory mapped buffer.
TokenBias: A list of tokens to bias during the process of inferencing.
Vocabulary: The vocabulary used by a model.

ContainerType: The format of the file containing the model.
ElementType: The type of a value in ggml.
FileType: How the tensors are stored in GGML LLM models.
InferenceError: Errors encountered during the inference process.
LoadError: Errors encountered during the loading process.
LoadProgress: Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
ModelKVMemoryType: Allowed types for the model memory K/V tensors.
QuantizeError: Errors encountered during the quantization process.
QuantizeProgress: Progress of quantization.
SnapshotError: Errors encountered during the snapshot process.

load: Load a GGML model from the path and configure it per the params. The status of the loading process will be reported through load_progress_callback.
load_progress_callback_stdout: A implementation for load_progress_callback that outputs to stdout.
quantize: Quantizes a model.