Skip to main content

Module mtmd

Module mtmd 

Source
Expand description

Safe wrappers for the libmtmd multimodal support library.

libmtmd extends llama.cpp with the ability to encode image and audio inputs (bitmaps) into token embeddings that can then be fed into a standard [llama_decode] call alongside normal text tokens.

§Quick-start

use std::path::Path;
use llama_cpp_4::{
    llama_backend::LlamaBackend,
    model::{LlamaModel, params::LlamaModelParams, AddBos},
    context::params::LlamaContextParams,
    mtmd::{MtmdContext, MtmdContextParams, MtmdBitmap, MtmdInputChunks, MtmdInputText},
};

let backend  = LlamaBackend::init().unwrap();
let model    = LlamaModel::load_from_file(&backend, Path::new("model.gguf"),
                                           &LlamaModelParams::default()).unwrap();
let mut lctx = model.new_context(&backend, LlamaContextParams::default()).unwrap();

// Load the multimodal projector (mmproj) model.
let ctx_params = MtmdContextParams::default();
let mtmd_ctx   = MtmdContext::init_from_file(Path::new("mmproj.gguf"), &model, ctx_params)
                              .unwrap();

// Load an image from a file.
let bitmap = MtmdBitmap::from_file(&mtmd_ctx, Path::new("image.jpg")).unwrap();

// Tokenize a prompt that contains the media marker.
let marker  = MtmdContext::default_marker();
let prompt  = format!("Describe this image: {marker}");
let text    = MtmdInputText::new(&prompt, true, true);
let bitmaps = [&bitmap];

let mut chunks = MtmdInputChunks::new();
mtmd_ctx.tokenize(&text, &bitmaps, &mut chunks).unwrap();

// Evaluate / decode all chunks.
let n_batch = lctx.n_batch() as i32;
let mut n_past = 0i32;
mtmd_ctx.eval_chunks(lctx.as_ptr(), &chunks, 0, 0, n_batch, true, &mut n_past).unwrap();

§Feature flag

This module is only compiled when the mtmd Cargo feature is enabled.

Structs§

MtmdBitmap
An image or audio bitmap ready for multimodal encoding.
MtmdContext
The main multimodal context.
MtmdContextParams
Parameters used when creating an MtmdContext.
MtmdImageTokens
Image/audio token metadata attached to a non-text MtmdInputChunk.
MtmdInputChunk
A single tokenized input chunk (text, image, or audio).
MtmdInputChunks
A list of tokenized input chunks produced by MtmdContext::tokenize.
MtmdInputText
Text input for MtmdContext::tokenize.

Enums§

MtmdError
All errors that can be returned by the mtmd module.
MtmdInputChunkType
The type of an MtmdInputChunk.

Type Aliases§

Result
A convenience Result alias for this module.