Crate llm_base

Source
Expand description

This crate provides a unified interface for loading and using Large Language Models (LLMs).

This is the base crate that implementors can use to implement their own LLMs.

As a user, you probably want to use the llm crate instead.

Re-exports§

pub use model::Hyperparameters;
pub use model::KnownModel;
pub use model::Model;
pub use model::ModelParameters;
pub use model::OutputRequest;
pub use util::TokenUtf8Buffer;
pub use ggml;

Modules§

model
Large language model traits and types
util
Utilities for interacting with LLMs and loading them.

Structs§

InferenceParameters
The parameters for text generation.
InferenceRequest
Settings specific to InferenceSession::infer.
InferenceSession
An inference session represents the state of the text generation. This holds the full context window, as well as several additional parameters used during sampling.
InferenceSessionConfig
Configuration for an inference session.
InferenceSnapshot
A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
InferenceStats
Statistics about the inference process.
Loader
A GGML format loader for LLMs.
Mmap
A handle to an immutable memory mapped buffer.
TokenBias
A list of tokens to bias during the process of inferencing.
Vocabulary
The vocabulary used by a model.

Enums§

ContainerType
The format of the file containing the model.
ElementType
The type of a value in ggml.
FileType
How the tensors are stored in GGML LLM models.
InferenceError
Errors encountered during the inference process.
LoadError
Errors encountered during the loading process.
LoadProgress
Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
ModelKVMemoryType
Allowed types for the model memory K/V tensors.
QuantizeError
Errors encountered during the quantization process.
QuantizeProgress
Progress of quantization.
SnapshotError
Errors encountered during the snapshot process.

Traits§

TensorLoader
Used by models to fetch tensors from a loader.

Functions§

load
Load a GGML model from the path and configure it per the params. The status of the loading process will be reported through load_progress_callback.
load_progress_callback_stdout
A implementation for load_progress_callback that outputs to stdout.
quantize
Quantizes a model.

Type Aliases§

TokenId
The identifier of a token in a vocabulary.