Crate llm_base

Expand description

LLaMA-rs is a Rust port of the llama.cpp project. This allows running inference for Facebook’s LLaMA model on a CPU with good performance using full precision, f16 or 4-bit quantized versions of the model.

Re-exports

pub use ggml;
pub use model::Hyperparameters;
pub use model::KnownModel;
pub use model::Model;
pub use model::ModelParameters;
pub use util::TokenUtf8Buffer;

Modules

model
Large language model traits and types
util
Utilities for interacting with LLMs and loading them.

Structs

EvaluateOutputRequest
Used in a call to Model::evaluate or InferenceSession::infer to request information from the model. If a value is set to Some, the Vec will be cleared, resized, and filled with the related data.
InferenceParameters
The parameters that drive text generation.
InferenceSession
An inference session represents the state of the text generation. This holds the full context window, as long as several additional parameters used during sampling.
InferenceSessionParameters
Parameters for an inference session.
InferenceSnapshot
A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
InferenceStats
Statistics about the inference process.
InferenceWithPromptParameters
Settings specific to InferenceSession::infer InferenceSession::infer_with_params.
Loader
A GGML format loader for LLMs.
Mmap
A handle to an immutable memory mapped buffer.
TokenBias
A list of tokens to bias during the process of inferencing.
Vocabulary
The vocabulary used by a model.

Enums

ContainerType
The format of the file containing the model.
ElementType
The type of a value in ggml.
FileType
How the tensors are stored in GGML LLM models.
InferenceError
Errors encountered during the inference process.
LoadError
Errors encountered during the loading process.
LoadProgress
Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
ModelKVMemoryType
Allowed types for the model memory K/V tensors.
QuantizeError
Errors encountered during the quantization process.
QuantizeProgress
Progress of quantization.
SnapshotError
Errors encountered during the snapshot process.

Traits

TensorLoader
Used by models to fetch tensors from a loader.

Functions

load
Load a GGML model from the path and configure it per the params. The status of the loading process will be reported through load_progress_callback.
load_progress_callback_stdout
A implementation for load_progress_callback that outputs to stdout.
quantize
Quantizes a model.

Type Definitions

TokenId
The identifier of a token in a vocabulary.