Expand description
Safe wrappers for the libmtmd multimodal support library.
libmtmd extends llama.cpp with the ability to encode image and audio
inputs (bitmaps) into token embeddings that can then be fed into a
standard [llama_decode] call alongside normal text tokens.
§Quick-start
use std::path::Path;
use llama_cpp_4::{
llama_backend::LlamaBackend,
model::{LlamaModel, params::LlamaModelParams, AddBos},
context::params::LlamaContextParams,
mtmd::{MtmdContext, MtmdContextParams, MtmdBitmap, MtmdInputChunks, MtmdInputText},
};
let backend = LlamaBackend::init().unwrap();
let model = LlamaModel::load_from_file(&backend, Path::new("model.gguf"),
&LlamaModelParams::default()).unwrap();
let mut lctx = model.new_context(&backend, LlamaContextParams::default()).unwrap();
// Load the multimodal projector (mmproj) model.
let ctx_params = MtmdContextParams::default();
let mtmd_ctx = MtmdContext::init_from_file(Path::new("mmproj.gguf"), &model, ctx_params)
.unwrap();
// Load an image from a file.
let bitmap = MtmdBitmap::from_file(&mtmd_ctx, Path::new("image.jpg")).unwrap();
// Tokenize a prompt that contains the media marker.
let marker = MtmdContext::default_marker();
let prompt = format!("Describe this image: {marker}");
let text = MtmdInputText::new(&prompt, true, true);
let bitmaps = [&bitmap];
let mut chunks = MtmdInputChunks::new();
mtmd_ctx.tokenize(&text, &bitmaps, &mut chunks).unwrap();
// Evaluate / decode all chunks.
let n_batch = lctx.n_batch() as i32;
let mut n_past = 0i32;
mtmd_ctx.eval_chunks(lctx.as_ptr(), &chunks, 0, 0, n_batch, true, &mut n_past).unwrap();§Feature flag
This module is only compiled when the mtmd Cargo feature is enabled.
Structs§
- Mtmd
Bitmap - An image or audio bitmap ready for multimodal encoding.
- Mtmd
Context - The main multimodal context.
- Mtmd
Context Params - Parameters used when creating an
MtmdContext. - Mtmd
Image Tokens - Image/audio token metadata attached to a non-text
MtmdInputChunk. - Mtmd
Input Chunk - A single tokenized input chunk (text, image, or audio).
- Mtmd
Input Chunks - A list of tokenized input chunks produced by
MtmdContext::tokenize. - Mtmd
Input Text - Text input for
MtmdContext::tokenize.
Enums§
- Mtmd
Error - All errors that can be returned by the mtmd module.
- Mtmd
Input Chunk Type - The type of an
MtmdInputChunk.
Type Aliases§
- Result
- A convenience
Resultalias for this module.