pub struct MtmdContext { /* private fields */ }Expand description
The main multimodal context.
Wraps a mtmd_context *. This context is tied to a specific mmproj model
file and a loaded LlamaModel. It is safe to share across threads for
tokenize calls (read-only), but encode_chunk / eval helpers mutate
internal state and must not be called concurrently.
Implementations§
Source§impl MtmdContext
impl MtmdContext
Sourcepub fn default_marker() -> &'static str
pub fn default_marker() -> &'static str
Returns the default media marker string used in prompts
(currently "<__media__>").
Sourcepub fn init_from_file(
mmproj_path: impl AsRef<Path>,
text_model: &LlamaModel,
params: MtmdContextParams,
) -> Result<Self>
pub fn init_from_file( mmproj_path: impl AsRef<Path>, text_model: &LlamaModel, params: MtmdContextParams, ) -> Result<Self>
Initialise a multimodal context from an mmproj GGUF file.
§Parameters
mmproj_path– path to the mmproj.gguffiletext_model– the already-loaded text modelparams– context parameters (useMtmdContextParams::default())
§Errors
Returns MtmdError::ContextCreateFailed if the underlying C call
returns a null pointer.
Sourcepub fn void_logs()
pub fn void_logs()
Silence all clip/mtmd log output by installing a no-op callback.
Call this right after init_from_file to
suppress the verbose clip_model_loader: tensor[N]… lines that
clip.cpp emits to its own private logger (separate from llama_log_set).
Sourcepub fn supports_vision(&self) -> bool
pub fn supports_vision(&self) -> bool
Returns true if the model supports vision (image) input.
Sourcepub fn supports_audio(&self) -> bool
pub fn supports_audio(&self) -> bool
Returns true if the model supports audio input.
Sourcepub fn audio_bitrate(&self) -> i32
👎Deprecated: use audio_sample_rate() instead
pub fn audio_bitrate(&self) -> i32
use audio_sample_rate() instead
Returns the audio sample rate in Hz (e.g. 16 000 for Whisper), or
-1 if audio is not supported.
Sourcepub fn audio_sample_rate(&self) -> i32
pub fn audio_sample_rate(&self) -> i32
Returns the audio sample rate in Hz.
Sourcepub fn decode_use_non_causal(&self) -> bool
pub fn decode_use_non_causal(&self) -> bool
Whether llama_decode must use a non-causal attention mask when
decoding image embeddings for this model.
Sourcepub fn decode_use_mrope(&self) -> bool
pub fn decode_use_mrope(&self) -> bool
Whether the model uses M-RoPE for llama_decode.
Sourcepub fn tokenize(
&self,
text: &MtmdInputText<'_>,
bitmaps: &[&MtmdBitmap],
output: &mut MtmdInputChunks,
) -> Result<()>
pub fn tokenize( &self, text: &MtmdInputText<'_>, bitmaps: &[&MtmdBitmap], output: &mut MtmdInputChunks, ) -> Result<()>
Tokenize a text prompt that contains one or more media markers.
The number of bitmaps must equal the number of media markers in the
prompt text, otherwise [MtmdError::TokenizeError(1)] is returned.
This call is thread-safe (shared &self).
§Parameters
text– text + tokenisation optionsbitmaps– slice ofMtmdBitmapreferences, one per media markeroutput– anMtmdInputChunksthat will be populated with the result
§Errors
Returns MtmdError::TokenizeError if tokenization fails.
Sourcepub fn encode_chunk(&self, chunk: &MtmdInputChunk<'_>) -> Result<()>
pub fn encode_chunk(&self, chunk: &MtmdInputChunk<'_>) -> Result<()>
Encode a single input chunk (image or audio) and store the resulting embeddings inside the context.
After a successful call, the embeddings can be retrieved with
MtmdContext::output_embd.
This call is NOT thread-safe.
§Errors
Returns MtmdError::EncodeError if encoding fails.
Sourcepub fn output_embd(&self, n_elements: usize) -> &[f32]
pub fn output_embd(&self, n_elements: usize) -> &[f32]
Return a slice over the embeddings produced by the last
encode_chunk call.
The length (in f32 elements) is:
n_embd_inp(model) * chunk.n_tokens()§Safety
The returned slice is valid until the next call that mutates the
context (e.g. another encode_chunk).
Sourcepub fn eval_chunks(
&self,
lctx: *mut llama_context,
chunks: &MtmdInputChunks,
n_past: i32,
seq_id: i32,
n_batch: i32,
logits_last: bool,
new_n_past: &mut i32,
) -> Result<()>
pub fn eval_chunks( &self, lctx: *mut llama_context, chunks: &MtmdInputChunks, n_past: i32, seq_id: i32, n_batch: i32, logits_last: bool, new_n_past: &mut i32, ) -> Result<()>
High-level helper: evaluate (decode) all chunks in sequence.
- Text chunks are decoded via
llama_decode. - Image/audio chunks are first encoded with
mtmd_encode_chunkand then decoded viallama_decode.
On success new_n_past is updated with the new past position.
This call is NOT thread-safe.
§Parameters
lctx– raw pointer to the llama context (fromLlamaContext::as_ptr)chunks– the tokenized chunks to evaluaten_past– current KV-cache positionseq_id– sequence IDn_batch– maximum batch size (must be ≥ 1)logits_last– iftrue, compute logits only for the final tokennew_n_past– updated KV-cache position after the call
§Errors
Returns MtmdError::EvalError if evaluation fails.
Sourcepub fn eval_chunk_single(
&self,
lctx: *mut llama_context,
chunk: &MtmdInputChunk<'_>,
n_past: i32,
seq_id: i32,
n_batch: i32,
logits_last: bool,
new_n_past: &mut i32,
) -> Result<()>
pub fn eval_chunk_single( &self, lctx: *mut llama_context, chunk: &MtmdInputChunk<'_>, n_past: i32, seq_id: i32, n_batch: i32, logits_last: bool, new_n_past: &mut i32, ) -> Result<()>
High-level helper: evaluate a single chunk.
Works identically to eval_chunks but operates on
one chunk at a time.
§Errors
Returns MtmdError::EvalError if evaluation fails.
Sourcepub fn as_ptr(&self) -> *mut mtmd_context
pub fn as_ptr(&self) -> *mut mtmd_context
Returns a raw pointer to the underlying mtmd_context.
§Safety
The returned pointer is valid for the lifetime of this MtmdContext.
The caller must not free it.