Expand description
Bindings to the llama.cpp library.
As llama.cpp is a very fast moving target, this crate does not attempt to create a stable API with all the rust idioms. Instead it provided safe wrappers around nearly direct bindings to llama.cpp. This makes it easier to keep up with the changes in llama.cpp, but does mean that the API is not as nice as it could be.
§Feature Flags
cudaenables CUDA gpu support.sampleradds the [context::sample::sampler] struct for a more rusty way of sampling.
Re-exports§
pub use error::ApplyChatTemplateError;pub use error::ChatTemplateError;pub use error::DecodeError;pub use error::EmbeddingsError;pub use error::EncodeError;pub use error::EvalMultimodalChunksError;pub use error::GrammarError;pub use error::LlamaContextLoadError;pub use error::LlamaCppError;pub use error::LlamaLoraAdapterInitError;pub use error::LlamaLoraAdapterRemoveError;pub use error::LlamaLoraAdapterSetError;pub use error::LlamaModelLoadError;pub use error::LogitsError;pub use error::MarkerDetectionError;pub use error::MetaValError;pub use error::ModelParamsError;pub use error::NewLlamaChatMessageError;pub use error::ParseChatMessageError;pub use error::Result;pub use error::SampleError;pub use error::SamplerAcceptError;pub use error::SamplingError;pub use error::StringToTokenError;pub use error::TokenSamplingError;pub use error::TokenToStringError;pub use chat_message_parse_outcome::ChatMessageParseOutcome;pub use llama_backend_device::LlamaBackendDevice;pub use llama_backend_device::LlamaBackendDeviceType;pub use llama_backend_device::list_llama_ggml_backend_devices;pub use raw_chat_message::RawChatMessage;pub use sampled_token::SampledToken;pub use sampled_token_classifier::SampledTokenClassifier;pub use sampled_token_classifier::SampledTokenSection;pub use ffi_status_is_ok::status_is_ok;pub use ffi_status_to_i32::status_to_i32;pub use ggml_time_us::ggml_time_us;pub use ingest_prompt_chunk::ingest_prompt_chunk;pub use json_schema_to_grammar::json_schema_to_grammar;pub use llama_time_us::llama_time_us;pub use max_devices::max_devices;pub use mlock_supported::mlock_supported;pub use mmap_supported::mmap_supported;pub use log::send_logs_to_tracing;pub use log_options::LogOptions;
Modules§
- batch_
add_ error - chat_
message_ parse_ outcome - context
- Safe wrapper around
llama_context. - error
- extract_
tool_ call_ markers_ from_ haystack - ffi_
error_ reader - ffi_
status_ is_ ok - ffi_
status_ to_ i32 - ggml_
time_ us - gguf_
context - Safe wrapper around
gguf_contextfor reading GGUF file metadata. - gguf_
context_ error - Error types for GGUF context operations.
- gguf_
type - GGUF value types.
- ingest_
prompt_ chunk - json_
schema_ to_ grammar - llama_
backend - Representation of an initialized llama backend
- llama_
backend_ device - llama_
backend_ numa_ strategy - llama_
batch - Safe wrapper around
llama_batch. - llama_
time_ us - llama_
token_ attr - llama_
token_ attrs - llama_
token_ attrs_ from_ int_ error - llguidance_
sampler - Pure Rust llguidance sampler for constrained decoding.
- log
- log_
options - max_
devices - mlock_
supported - mmap_
supported - model
- A safe wrapper around
llama_model. - mtmd
- Safe wrapper around multimodal (MTMD) functionality in llama.cpp.
- raw_
chat_ message - resolved_
tool_ call_ markers - sampled_
token - sampled_
token_ classifier - sampling
- Safe wrapper around
llama_sampler. - streaming_
json_ probe - timing
- Safe wrapper around
llama_timings. - token
- Safe wrappers around
llama_token_dataandllama_token_data_array. - tool_
call_ format - tool_
call_ marker_ pair - tool_
call_ template_ overrides
Structs§
- Bracketed
Json Shape - KeyValue
XmlTags Shape - Paired
Quote Shape - Parsed
Chat Message - Parsed
Tool Call - Reasoning
Markers - Token
Usage - Tool
Call Markers - Tool
Call Value Quote - XmlTags
Shape