Skip to main content

Crate llama_cpp_bindings

Crate llama_cpp_bindings

Expand description

Bindings to the llama.cpp library.

As llama.cpp is a very fast moving target, this crate does not attempt to create a stable API with all the rust idioms. Instead it provided safe wrappers around nearly direct bindings to llama.cpp. This makes it easier to keep up with the changes in llama.cpp, but does mean that the API is not as nice as it could be.

§Feature Flags

cuda enables CUDA gpu support.
sampler adds the [context::sample::sampler] struct for a more rusty way of sampling.

Re-exports§

pub use error::ApplyChatTemplateError;
pub use error::ChatTemplateError;
pub use error::DecodeError;
pub use error::EmbeddingsError;
pub use error::EncodeError;
pub use error::EvalMultimodalChunksError;
pub use error::GrammarError;
pub use error::LlamaContextLoadError;
pub use error::LlamaCppError;
pub use error::LlamaLoraAdapterInitError;
pub use error::LlamaLoraAdapterRemoveError;
pub use error::LlamaLoraAdapterSetError;
pub use error::LlamaModelLoadError;
pub use error::LogitsError;
pub use error::MarkerDetectionError;
pub use error::MetaValError;
pub use error::ModelParamsError;
pub use error::NewLlamaChatMessageError;
pub use error::ParseChatMessageError;
pub use error::Result;
pub use error::SampleError;
pub use error::SamplerAcceptError;
pub use error::SamplingError;
pub use error::StringToTokenError;
pub use error::TokenSamplingError;
pub use error::TokenToStringError;
pub use chat_message_parse_outcome::ChatMessageParseOutcome;
pub use llama_backend_device::LlamaBackendDevice;
pub use llama_backend_device::LlamaBackendDeviceType;
pub use llama_backend_device::list_llama_ggml_backend_devices;
pub use raw_chat_message::RawChatMessage;
pub use sampled_token::SampledToken;
pub use sampled_token_classifier::SampledTokenClassifier;
pub use sampled_token_classifier::SampledTokenSection;
pub use ffi_status_is_ok::status_is_ok;
pub use ffi_status_to_i32::status_to_i32;
pub use ggml_time_us::ggml_time_us;
pub use ingest_prompt_chunk::ingest_prompt_chunk;
pub use json_schema_to_grammar::json_schema_to_grammar;
pub use llama_time_us::llama_time_us;
pub use max_devices::max_devices;
pub use mlock_supported::mlock_supported;
pub use mmap_supported::mmap_supported;
pub use log::send_logs_to_tracing;
pub use log_options::LogOptions;

Modules§

batch_add_error
chat_message_parse_outcome
context: Safe wrapper around llama_context.
error
extract_tool_call_markers_from_haystack
ffi_error_reader
ffi_status_is_ok
ffi_status_to_i32
ggml_time_us
gguf_context: Safe wrapper around gguf_context for reading GGUF file metadata.
gguf_context_error: Error types for GGUF context operations.
gguf_type: GGUF value types.
ingest_prompt_chunk
json_schema_to_grammar
llama_backend: Representation of an initialized llama backend
llama_backend_device
llama_backend_numa_strategy
llama_batch: Safe wrapper around llama_batch.
llama_time_us
llama_token_attr
llama_token_attrs
llama_token_attrs_from_int_error
llguidance_sampler: Pure Rust llguidance sampler for constrained decoding.
log
log_options
max_devices
mlock_supported
mmap_supported
model: A safe wrapper around llama_model.
mtmd: Safe wrapper around multimodal (MTMD) functionality in llama.cpp.
raw_chat_message
resolved_tool_call_markers
sampled_token
sampled_token_classifier
sampling: Safe wrapper around llama_sampler.
streaming_json_probe
timing: Safe wrapper around llama_timings.
token: Safe wrappers around llama_token_data and llama_token_data_array.
tool_call_format
tool_call_marker_pair
tool_call_template_overrides

Structs§

BracketedJsonShape
KeyValueXmlTagsShape
PairedQuoteShape
ParsedChatMessage
ParsedToolCall
ReasoningMarkers
TokenUsage
ToolCallMarkers
ToolCallValueQuote
XmlTagsShape

Enums§

TokenUsageError
ToolCallArgsShape
ToolCallArguments