Skip to main content

Module models

Module models 

Source

Structs§

BenchTuneConfig
BenchTuneMetrics
BenchTuneParam
BenchTuneParamValue
BenchTuneResult
DiscoveredModel
A discovered model file.
DownloadState
Download progress information.
GPUBuffer
GPU device buffer reported by llama-server during model loading.
GgufMetadata
Parsed GGUF metadata for a model, cached to avoid re-parsing the file.
LoadProgress
Progress information during model loading, parsed from llama-server log output.
ModelSettings
Settings for loading a model via llama.cpp server.
Samplers
Sampler order string (semicolon-separated). Common types: penalties, dry, top_n_sigma, top_k, typ_p, top_p, min_p, xtc, temperature
SearchResult
A model found via HuggingFace search.
ServerMetrics
Metrics reported by the llama.cpp server.
WsMetrics
WebSocket-friendly metrics snapshot (serializable, no internal state).

Enums§

Backend
Backend used to run the llama.cpp server.
BenchTuneMode
BenchTuneProgress
Progress status for benchmark tuning
BenchTuneStatus
CacheQuantType
KV cache quantization type.
CacheType
Main KV cache data type.
DownloadStatus
GpuLayersMode
How to handle GPU layer offloading.
Mirostat
Mirostat version.
ModelState
The state of a model in the manager.
NumMode
NUMA optimization mode.
RopeScaling
RoPE frequency scaling method.
SearchSort
Sort order for search results.
ServerMode
Server mode: normal (single model) or router (multiple models).
SplitMode
Split mode for multi-GPU.

Constants§

BENCHMARK_PROMPT
Default benchmark prompt used when starting a tuning session.

Functions§

clean_host
Ensure host string is valid for URL construction and CLI arguments. Handles empty strings (defaults to 127.0.0.1), strips display suffixes, and wraps IPv6 addresses in brackets.
estimate_vram_mib
Estimate VRAM usage (in MiB) for a model with the given settings.
format_host
Format a host string for display (e.g. “” or “127.0.0.1” -> “localhost (127.0.0.1)”).
strip_gguf
Strip the .gguf extension from a model name.

Type Aliases§

CacheTypeK
CacheTypeV