Expand description
LLaMA-rs is a Rust port of the llama.cpp project. This allows running inference for Facebook’s LLaMA model on a CPU with good performance using full precision, f16 or 4-bit quantized versions of the model.
Re-exports
pub use ggml;pub use model::Hyperparameters;pub use model::KnownModel;pub use model::Model;pub use model::ModelParameters;pub use util::TokenUtf8Buffer;
Modules
- Large language model traits and types
- Utilities for interacting with LLMs and loading them.
Structs
- Used in a call to Model::evaluate or InferenceSession::infer to request information from the model. If a value is set to
Some, theVecwill be cleared, resized, and filled with the related data. - The parameters that drive text generation.
- An inference session represents the state of the text generation. This holds the full context window, as long as several additional parameters used during sampling.
- Parameters for an inference session.
- A serializable snapshot of the inference process. Can be restored by calling InferenceSession::from_snapshot.
- Statistics about the inference process.
- Settings specific to InferenceSession::infer InferenceSession::infer_with_params.
- A GGML format loader for LLMs.
- A handle to an immutable memory mapped buffer.
- A list of tokens to bias during the process of inferencing.
- The vocabulary used by a model.
Enums
- The format of the file containing the model.
- The type of a value in
ggml. - How the tensors are stored in GGML LLM models.
- Errors encountered during the inference process.
- Errors encountered during the loading process.
- Each variant represents a step within the process of loading the model. These can be used to report progress to the user.
- Allowed types for the model memory K/V tensors.
- Errors encountered during the quantization process.
- Progress of quantization.
- Errors encountered during the snapshot process.
Traits
- Used by models to fetch tensors from a loader.
Functions
- Load a GGML model from the
pathand configure it per theparams. The status of the loading process will be reported throughload_progress_callback. - A implementation for
load_progress_callbackthat outputs tostdout. - Quantizes a model.
Type Definitions
- The identifier of a token in a vocabulary.