Module mtmd

Expand description

Safe wrappers for the libmtmd multimodal support library.

libmtmd extends llama.cpp with the ability to encode image and audio inputs (bitmaps) into token embeddings that can then be fed into a standard [llama_decode] call alongside normal text tokens.

§Quick-start

use std::path::Path;
use llama_cpp_4::{
    llama_backend::LlamaBackend,
    model::{LlamaModel, params::LlamaModelParams, AddBos},
    context::params::LlamaContextParams,
    mtmd::{MtmdContext, MtmdContextParams, MtmdBitmap, MtmdInputChunks, MtmdInputText},
};

let backend  = LlamaBackend::init().unwrap();
let model    = LlamaModel::load_from_file(&backend, Path::new("model.gguf"),
                                           &LlamaModelParams::default()).unwrap();
let mut lctx = model.new_context(&backend, LlamaContextParams::default()).unwrap();

// Load the multimodal projector (mmproj) model.
let ctx_params = MtmdContextParams::default();
let mtmd_ctx   = MtmdContext::init_from_file(Path::new("mmproj.gguf"), &model, ctx_params)
                              .unwrap();

// Load an image from a file.
let bitmap = MtmdBitmap::from_file(&mtmd_ctx, Path::new("image.jpg")).unwrap();

// Tokenize a prompt that contains the media marker.
let marker  = MtmdContext::default_marker();
let prompt  = format!("Describe this image: {marker}");
let text    = MtmdInputText::new(&prompt, true, true);
let bitmaps = [&bitmap];

let mut chunks = MtmdInputChunks::new();
mtmd_ctx.tokenize(&text, &bitmaps, &mut chunks).unwrap();

// Evaluate / decode all chunks.
let n_batch = lctx.n_batch() as i32;
let mut n_past = 0i32;
mtmd_ctx.eval_chunks(lctx.as_ptr(), &chunks, 0, 0, n_batch, true, &mut n_past).unwrap();

§Feature flag

This module is only compiled when the mtmd Cargo feature is enabled.

Structs§

MtmdBitmap: An image or audio bitmap ready for multimodal encoding.
MtmdContext: The main multimodal context.
MtmdContextParams: Parameters used when creating an MtmdContext.
MtmdImageTokens: Image/audio token metadata attached to a non-text MtmdInputChunk.
MtmdInputChunk: A single tokenized input chunk (text, image, or audio).
MtmdInputChunks: A list of tokenized input chunks produced by MtmdContext::tokenize.
MtmdInputText: Text input for MtmdContext::tokenize.

Enums§

MtmdError: All errors that can be returned by the mtmd module.
MtmdInputChunkType: The type of an MtmdInputChunk.

Type Aliases§

Result: A convenience Result alias for this module.

Module mtmd

Module mtmd Copy item path

§Quick-start

§Feature flag

Structs§

Enums§

Type Aliases§

Module mtmd