Skip to main content

Crate llama_crab

Crate llama_crab 

Source
Expand description

llama-crab — safe, ergonomic Rust bindings to llama.cpp.

§Quickstart

use llama_crab::{Llama, LlamaParams};

let mut llama = Llama::load(LlamaParams::new("model.gguf").with_n_ctx(2048))?;
let resp = llama.create_completion("Hello, world!", 64)?;
println!("{}", resp.text);

Re-exports§

pub use crate::backend::LlamaBackend;
pub use crate::backend::NumaStrategy;
pub use crate::batch::BatchAddError;
pub use crate::batch::LlamaBatch;
pub use crate::chat::Role;
pub use crate::context::LlamaContext;
pub use crate::context::LlamaContextParams;
pub use crate::error::LlamaError;
pub use crate::error::Result;
pub use crate::high_level::chat_completion::ChatMessage;
pub use crate::high_level::completion::Completion;
pub use crate::high_level::completion::StopReason;
pub use crate::high_level::tokenizer::FimTokens;
pub use crate::high_level::tokenizer::LlamaTokenizer;
pub use crate::high_level::tokenizer::Tokenizer;
pub use crate::high_level::Llama;
pub use crate::high_level::LlamaParams;
pub use crate::log::send_logs_to_tracing;
pub use crate::log::LogOptions;
pub use crate::logit_bias::LlamaLogitBias;
pub use crate::model::params::LlamaModelParams;
pub use crate::model::LlamaModel;
pub use crate::sampling::LlamaSampler;
pub use crate::sampling::SamplerChain;
pub use crate::token::LlamaToken;
pub use crate::token_data::LlamaTokenData;
pub use crate::token_data::LlamaTokenDataArray;

Modules§

backend
Global backend initialization, NUMA strategy, device enumeration.
batch
Reusable batching primitive.
cache
Caching for previously-computed prompt prefixes.
chat
Chat message types, templates and tool calling.
context
LlamaContext and its parameters.
error
Error types for llama-crab.
high_level
High-level orchestrator: load model, create context, generate tokens.
json_schema
JSON Schema → GBNF grammar converter.
log
Forwarding llama.cpp/ggml logs into the tracing ecosystem.
logit_bias
LlamaLogitBias — a single (token, bias) pair, used by the logit-bias sampler.
model
LlamaModel and its parameters.
multimodalmtmd
Multimodal (vision + audio) support via mtmd.
sampling
Sampling strategies and the LlamaSampler wrapper.
speculative
Speculative decoding: a draft model proposes candidate tokens, the main model validates them in a single forward pass.
token
LlamaToken newtype and LlamaTokenAttr bitflags.
token_data
LlamaTokenData and LlamaTokenDataArray.
util
Small utilities used across the crate.