Expand description
llama-crab — safe, ergonomic Rust bindings to llama.cpp.
§Quickstart
use llama_crab::{Llama, LlamaParams};
let mut llama = Llama::load(LlamaParams::new("model.gguf").with_n_ctx(2048))?;
let resp = llama.create_completion("Hello, world!", 64)?;
println!("{}", resp.text);Re-exports§
pub use crate::backend::LlamaBackend;pub use crate::backend::NumaStrategy;pub use crate::batch::BatchAddError;pub use crate::batch::LlamaBatch;pub use crate::chat::Role;pub use crate::context::LlamaContext;pub use crate::context::LlamaContextParams;pub use crate::error::LlamaError;pub use crate::error::Result;pub use crate::high_level::chat_completion::ChatMessage;pub use crate::high_level::completion::Completion;pub use crate::high_level::completion::StopReason;pub use crate::high_level::tokenizer::FimTokens;pub use crate::high_level::tokenizer::LlamaTokenizer;pub use crate::high_level::tokenizer::Tokenizer;pub use crate::high_level::Llama;pub use crate::high_level::LlamaParams;pub use crate::log::send_logs_to_tracing;pub use crate::log::LogOptions;pub use crate::logit_bias::LlamaLogitBias;pub use crate::model::params::LlamaModelParams;pub use crate::model::LlamaModel;pub use crate::sampling::LlamaSampler;pub use crate::sampling::SamplerChain;pub use crate::token::LlamaToken;pub use crate::token_data::LlamaTokenData;pub use crate::token_data::LlamaTokenDataArray;
Modules§
- backend
- Global backend initialization, NUMA strategy, device enumeration.
- batch
- Reusable batching primitive.
- cache
- Caching for previously-computed prompt prefixes.
- chat
- Chat message types, templates and tool calling.
- context
LlamaContextand its parameters.- error
- Error types for
llama-crab. - high_
level - High-level orchestrator: load model, create context, generate tokens.
- json_
schema - JSON Schema → GBNF grammar converter.
- log
- Forwarding llama.cpp/ggml logs into the
tracingecosystem. - logit_
bias LlamaLogitBias— a single(token, bias)pair, used by the logit-bias sampler.- model
LlamaModeland its parameters.- multimodal
mtmd - Multimodal (vision + audio) support via
mtmd. - sampling
- Sampling strategies and the
LlamaSamplerwrapper. - speculative
- Speculative decoding: a draft model proposes candidate tokens, the main model validates them in a single forward pass.
- token
LlamaTokennewtype andLlamaTokenAttrbitflags.- token_
data LlamaTokenDataandLlamaTokenDataArray.- util
- Small utilities used across the crate.