Skip to main content

Crate hanzo_engine

Crate hanzo_engine 

Source

Re-exports§

pub use files::format_from_name;
pub use files::is_text_mime;
pub use files::mime_for_format;
pub use files::File;
pub use files::FileContent;
pub use files::FileSource;
pub use files::FileStore;
pub use files::RequestedFile;
pub use files::MODEL_INLINE_BYTES;
pub use files::WIRE_EMBED_LIMIT_BYTES;
pub use speculative::MtpConfig;
pub use speculative::SpeculativeConfig;
pub use llguidance;

Modules§

disk_kv_cache
distributed
files
Typed file outputs from agentic runs.
layers
matformer
reasoning_parsers
Unified reasoning/thinking parser framework.
speculative
speech_utils

Structs§

AddModelConfig
Configuration for adding a model to Hanzo
AgentToolApproval
AgentToolApprovalDecision
AgentToolApprovalRequest
AgentToolMetadata
AgenticSessionStore
Agentic conversation state, keyed by session ID. Also supports content-based matching for clients that don’t pass an ID.
AgenticToolCallRecord
AnyMoeConfig
AnyMoeLoader
AnyMoePipeline
ApproximateUserLocation
AudioInput
Raw audio input consisting of PCM samples and a sample rate.
AutoLoader
Automatically selects the appropriate loader based on repository/config metadata.
AutoLoaderBuilder
AutoTuneRequest
AutoTuneResult
BuildInfo
CalledFunction
Called function with name and arguments
ChatCompletionChunkResponse
Chat completion streaming request chunk.
ChatCompletionResponse
An OpenAI compatible chat completion response.
ChatTemplate
Template for chat models including bos/eos/unk as well as the chat template.
Choice
Chat completion choice.
ChunkChoice
Chat completion streaming chunk choice.
CodeExecutionApproval
CodeExecutionApprovalRequest
CodeExecutionConfig
Python code execution config.
CompletionChoice
Completion request choice.
CompletionChunkChoice
Chat completion streaming chunk choice.
CompletionChunkResponse
Completion request choice.
CompletionResponse
An OpenAI compatible completion response.
CpuInfo
Delta
Delta in content for streaming response.
DetokenizationRequest
Request to detokenize some text.
DeviceInfo
DeviceLayerMapMetadata
DeviceMapMetadata
Metadata to initialize the device mapper.
DiffusionGenerationParams
DiffusionLoader
A loader for a diffusion (non-quantized) model.
DiffusionLoaderBuilder
A builder for a loader for a diffusion (non-quantized) model.
DoctorCheck
DoctorReport
DrySamplingParams
Parameters for DRY (Don’t Repeat Yourself) sampling to reduce repetition.
EmbeddingLoader
A loader for an embedding (non-quantized) model.
EmbeddingLoaderBuilder
A builder for a loader for an embedding (non-quantized) model.
EmbeddingModelPaths
All local paths and metadata necessary to load an embedding model.
EmbeddingSpecificConfig
Config specific to loading an embedding model.
EngineConfig
Configuration for creating an engine instance
Function
Function definition for a tool
GGMLLoader
A loader for a GGML model.
GGMLLoaderBuilder
A builder for a GGML loader.
GGMLSpecificConfig
Config for a GGML loader.
GGUFLoader
Loader for a GGUF model.
GGUFLoaderBuilder
A builder for a GGUF loader.
GGUFSpecificConfig
Config for a GGUF loader.
GemmaLoader
NormalLoader for a Gemma model.
Hanzo
The Hanzo struct handles sending requests to multiple engines. It is the core multi-threaded component of hanzo, and uses mpsc Sender and Receiver primitives to send and receive requests to the appropriate engine based on model ID.
HanzoBuilder
The HanzoBuilder takes the pipeline and a scheduler method and constructs an Engine and a Hanzo instance. The Engine runs on a separate thread, and the Hanzo instance stays on the calling thread.
HanzoConfig
HfConnectivityInfo
Idefics2Loader
MultimodalLoader for an Idefics 2 Vision model.
ImageChoice
ImageGenerationResponse
IntervalLogger
LLaVALoader
MultimodalLoader for an LLaVA Vision model.
LLaVANextLoader
MultimodalLoader for an LLaVANext Vision model.
LayerDeviceMapper
A device mapper which does device mapping per hidden layer.
LayerTopology
LlamaLoader
NormalLoader for a Llama model.
LoaderBuilder
A builder for a loader using the selected model.
LocalModelPaths
All local paths and metadata necessary to load a model.
Logprobs
Logprobs per token.
LoraAdapterPaths
McpClient
MCP client that manages connections to multiple MCP servers
McpClientConfig
Configuration for MCP client integration
McpServerConfig
Configuration for an individual MCP server
McpToolInfo
Information about a tool discovered from an MCP server
MemoryInfo
MemoryUsage
MistralLoader
MixtralLoader
Modalities
ModelGenerationDefaults
Optional generation defaults parsed from a model’s generation_config.json.
ModelLoaderConfig
Configuration for recreating a model loader when reloading an unloaded model. This captures the essential parameters needed to reconstruct a loader.
MultimodalLoader
A loader for a multimodal (non-quantized) model.
MultimodalLoaderBuilder
A builder for a loader for a multimodal (non-quantized) model.
MultimodalSpecificConfig
Config specific to loading a multimodal model.
NormalLoader
A loader for a “normal” (non-quantized) model.
NormalLoaderBuilder
A builder for a loader for a “normal” (non-quantized) model.
NormalRequest
A normal request request to the Hanzo.
NormalSpecificConfig
Config specific to loading a normal model.
Ordering
Adapter model ordering information.
PagedAttentionConfig
All memory counts in MB. Default for block size is 32.
Phi2Loader
NormalLoader for a Phi 2 model.
Phi3Loader
NormalLoader for a Phi 3 model.
Phi3VLoader
MultimodalLoader for a Phi 3 Vision model.
Qwen2Loader
NormalLoader for a Qwen 2 model.
ResponseLogprob
A logprob with the top logprobs for this token.
ResponseMessage
Chat completion response message.
SamplingParams
Sampling params are used to control sampling.
SandboxPolicy
Policy applied to a sandboxed process.
SearchFunctionParameters
SearchResult
SerializedSession
Wire format. Images and video frames are base64 PNGs.
SerializedVideo
SpeechLoader
SpeechPipeline
Starcoder2Loader
NormalLoader for a Starcoder2 model.
SystemInfo
TokenizationRequest
Request to tokenize some messages or some text.
Tool
Tool definition
ToolCallContext
Context provided to tool callbacks by the agentic loop.
ToolCallResponse
ToolCallbackWithTool
A tool callback with its associated Tool definition.
TopLogprob
Top-n logprobs element
Topology
TuneCandidate
A tuning candidate with all calculated metrics
UnloadedModelState
State preserved when a model is unloaded. This contains all the information needed to reload the model on demand.
Usage
OpenAI compatible (superset) usage during a request.
VideoInput
Decoded video input: a sequence of frames with metadata for timestamp generation.
WebSearchOptions

Enums§

AdapterPaths
AgentPermission
AgentToolApprovalHandler
AgentToolKind
AgentToolSource
AgenticToolCallData
Tool-specific structured progress data for agentic tool calls.
AgenticToolCallPhase
Phase of an agentic tool call.
AnyMoeExpertType
AutoDeviceMapParams
CodeExecutionPermission
Constraint
Control the constraint with llguidance.
DefaultSchedulerMethod
The scheduler method controld how sequences are scheduled during each step of the engine. For each scheduling step, the scheduler method is used if there are not only running, only waiting sequences, or none. If is it used, then it is used to allow waiting sequences to run.
DeviceMapSetting
DiffusionLoaderType
The architecture to load the diffusion model as.
DoctorStatus
EmbeddingLoaderType
The architecture to load the embedding model as.
EngineInstruction
FitStatus
Fit status for a quantization candidate
GGUFArchitecture
HanzoError
ImageGenerationResponseFormat
Image generation response format
IsqBits
Target bit width for automatic ISQ quantization.
IsqOrganization
IsqType
In-situ quantization type specifying the format to apply to model weights.
McpServerSource
Supported MCP server transport sources
MemoryGpuConfig
ModelCategory
Category of the model. This can also be used to extract model-category specific tools, such as the multimodal model prompt prefixer.
ModelDType
DType for the model.
ModelKind
The kind of model to build.
ModelSelected
ModelStatus
Model status for loaded/unloaded state
MultimodalLoaderType
The architecture to load the multimodal model as.
NetworkMode
Network access permitted to sandboxed processes.
NormalLoaderType
The architecture to load the normal model as.
PagedCacheType
QualityTier
Quality tier for a quantization level
ReasoningEffort
Reasoning effort level for models that support it (e.g., GPT-OSS with Harmony format). Controls the depth of reasoning/analysis in the model’s response.
Request
A request to the Engine, encapsulating the various parameters as well as the mpsc response Sender used to return the Response.
RequestMessage
Message or messages for a Request.
Response
The response enum contains 3 types of variants:
ResponseErr
ResponseOk
SchedulerConfig
SearchContextSize
SearchEmbeddingModel
Embedding model used for ranking web search results internally.
SpeechGenerationConfig
SpeechLoaderType
StopTokens
Stop sequences or ids.
SupportedModality
TokenSource
The source of the HF token.
ToolCallType
The type of a tool call (currently only function calls).
ToolCallbackKind
Wraps either a text-only or multimodal tool callback.
ToolChoice
ToolOutput
Tool output: text-only or multimodal.
ToolType
Type of tool
TuneProfile
WebSearchUserLocation

Constants§

DEFAULT_MAX_TOOL_ROUNDS
Default cap on tool-use rounds when the request doesn’t set one.
GGUF_MULTI_FILE_DELIMITER
HANZO_GIT_REVISION
HF_HUB_OFFLINE_ENV
Env variable that, when set to a truthy value, disables all network calls to the Hugging Face Hub. Only cached files are used.
MULTI_LORA_DELIMITER
SYSTEM_FINGERPRINT
UQFF_MULTI_FILE_DELIMITER

Statics§

ENGINE_INSTRUCTIONS
Engine instructions, per Engine (Hanzo) ID.
GLOBAL_HF_CACHE
TERMINATE_ALL_NEXT_STEP
Terminate all sequences on the next scheduling step. Be sure to reset this. This is a global flag for terminating all engines at once (e.g., Ctrl+C).

Traits§

CustomLogitsProcessor
Customizable logits processor.
Loader
The Loader trait abstracts the loading process. The primary entrypoint is the load_model method.
ModelPaths
ModelPaths abstracts the mechanism to get all necessary files for running a model. For example LocalModelPaths implements ModelPaths when all files are in the local file system.
MultimodalPromptPrefixer
Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at 0.
Pipeline
TryIntoDType
Type which can be converted to a DType

Functions§

auto_tune
check_hf_gated_access
Check HuggingFace connectivity and token validity by accessing a gated model
collect_system_info
expand_isq_value
Expand an ISQ specifier into concrete IsqType variants. Numeric shorthands (2-8) produce both the non-Metal and Metal variants; explicit method names resolve to a single variant.
get_auto_device_map_params
get_engine_terminate_flag
Get or create a termination flag for the current engine thread.
get_model_dtype
get_tgt_non_granular_index
get_toml_selected_model_device_map_params
get_toml_selected_model_dtype
hf_home_dir
Resolve the Hugging Face home directory.
hf_hub_cache_dir
Resolve the Hugging Face Hub cache directory.
hf_token_path
Resolve the Hugging Face token file path.
initialize_logging
This should be called to initialize the debug flag and logging. This should not be called in hanzo-engine code due to Rust usage.
is_hf_hub_offline
Returns true when the user has requested fully-offline operation via HF_HUB_OFFLINE. Accepted truthy values: 1, true, yes, on (case-insensitive). Anything else, or unset, is treated as online.
paged_attn_supported
true if built with CUDA (requires Unix) /Metal
parse_isq_value
Parse ISQ value.
parse_uqff_shard
Given a UQFF filename like "q4k-0.uqff", returns Some(("q4k", 0)). Returns None for non-sharded filenames like "model.uqff" where the suffix after the last - is not a number.
probe_hf_repo_files
Best-effort file listing for a HF repo. Returns None on 404, API failure, or offline-without-cache. Quiet by design: callers choose what to log.
reset_engine_terminate_flag
Reset termination flags for the current engine.
resolve_uqff_shorthand
Resolve a UQFF shorthand (numeric like "8" or ISQ name like "q4k") to an actual UQFF filename from the available files list.
run_doctor
sample_frame_indices
Sample num_frames frame indices uniformly from a video with total_frames frames.
should_terminate_engine_sequences
Check if the current engine should terminate sequences.
using_flash_attn
true if built with the flash-attn or flash-attn-v3 features, false otherwise.

Type Aliases§

AgentToolApprovalAsyncCallback
AgentToolApprovalCallback
AgentToolApprovalFuture
AgentToolApprovalNotifier
CodeExecutionApprovalCallback
CodeExecutionApprovalNotifier
LlguidanceGrammar
MessageContent
MultimodalToolCallback
Callback that can return multimodal output (text + images).
SearchCallback
Callback used to override how search results are gathered. The returned vector must be sorted in decreasing order of relevance.
ToolCallback
Custom tool callback. Receives the called function and returns the tool output as a string.
ToolCallbacks
Collection of callbacks keyed by tool name.