An accessible low-overhead wrapper over llama.cpp
powered by llama-cpp-2, supporting
most .gguf-formatted "Chat" and "Embedding" models.
Examples
These examples assume the following models are downloaded into the working directory:
Chat (via Llama 3.2 Instruct)
// Create a new local model registry and load
// a chat model into it with a system prompt
// of "You are a cupcake."
let registry = new.unwrap;
let mut model = registry
.load_chat_model
.unwrap;
// Run ("infer") the model with the prompt
// "What are you?", capturing its output
// as UTF-8 encoded bytes.
let mut output = vec!;
model.infer.unwrap;
let output = Stringfrom_utf8_lossy;
// Hopefully, the model thinks it's a cupcake due
// to the system prompt.
assert!;
Embedding (via Nomic Embedding 1.5)
// Create a new local model registry and load
// an embedding model into it.
let registry = new.unwrap;
let mut model = registry
.load_text_embedding_model
.unwrap;
// Embed some fanciful document titles with the model.
let embeddings = model
.embed
.unwrap;
assert_eq!;
// Embed a search query with the model.
let query_embeddings = model.embed.unwrap;
assert_eq!;
// Calculate the cosine distance (or "similarity") between the embeddings.
let distance_a = cosine_distance;
let distance_b = cosine_distance;
let distance_c = cosine_distance;
// The fantasy embeddings should be more similar
// than the scientific embedding.
assert!;
assert!;
License
Copyright © 2025 With Caer, LLC.
Licensed under the MIT license. Refer to the license file for more info.