Crate hanzo_engine

Source

Re-exports§

pub use files::format_from_name;
pub use files::is_text_mime;
pub use files::mime_for_format;
pub use files::File;
pub use files::FileContent;
pub use files::FileSource;
pub use files::FileStore;
pub use files::RequestedFile;
pub use files::MODEL_INLINE_BYTES;
pub use files::WIRE_EMBED_LIMIT_BYTES;
pub use speculative::MtpConfig;
pub use speculative::SpeculativeConfig;
pub use llguidance;

Modules§

disk_kv_cache
distributed
files: Typed file outputs from agentic runs.
layers
matformer
reasoning_parsers: Unified reasoning/thinking parser framework.
speculative
speech_utils

Structs§

AddModelConfig: Configuration for adding a model to Hanzo
AgentToolApproval
AgentToolApprovalDecision
AgentToolApprovalRequest
AgentToolMetadata
AgenticSessionStore: Agentic conversation state, keyed by session ID. Also supports content-based matching for clients that don’t pass an ID.
AgenticToolCallRecord
AnyMoeConfig
AnyMoeLoader
AnyMoePipeline
ApproximateUserLocation
AudioInput: Raw audio input consisting of PCM samples and a sample rate.
AutoLoader: Automatically selects the appropriate loader based on repository/config metadata.
AutoLoaderBuilder
AutoTuneRequest
AutoTuneResult
BuildInfo
CalledFunction: Called function with name and arguments
ChatCompletionChunkResponse: Chat completion streaming request chunk.
ChatCompletionResponse: An OpenAI compatible chat completion response.
ChatTemplate: Template for chat models including bos/eos/unk as well as the chat template.
Choice: Chat completion choice.
ChunkChoice: Chat completion streaming chunk choice.
CodeExecutionApproval
CodeExecutionApprovalRequest
CodeExecutionConfig: Python code execution config.
CompletionChoice: Completion request choice.
CompletionChunkChoice: Chat completion streaming chunk choice.
CompletionChunkResponse: Completion request choice.
CompletionResponse: An OpenAI compatible completion response.
CpuInfo
Delta: Delta in content for streaming response.
DetokenizationRequest: Request to detokenize some text.
DeviceInfo
DeviceLayerMapMetadata
DeviceMapMetadata: Metadata to initialize the device mapper.
DiffusionGenerationParams
DiffusionLoader: A loader for a diffusion (non-quantized) model.
DiffusionLoaderBuilder: A builder for a loader for a diffusion (non-quantized) model.
DoctorCheck
DoctorReport
DrySamplingParams: Parameters for DRY (Don’t Repeat Yourself) sampling to reduce repetition.
EmbeddingLoader: A loader for an embedding (non-quantized) model.
EmbeddingLoaderBuilder: A builder for a loader for an embedding (non-quantized) model.
EmbeddingModelPaths: All local paths and metadata necessary to load an embedding model.
EmbeddingSpecificConfig: Config specific to loading an embedding model.
EngineConfig: Configuration for creating an engine instance
Function: Function definition for a tool
GGMLLoader: A loader for a GGML model.
GGMLLoaderBuilder: A builder for a GGML loader.
GGMLSpecificConfig: Config for a GGML loader.
GGUFLoader: Loader for a GGUF model.
GGUFLoaderBuilder: A builder for a GGUF loader.
GGUFSpecificConfig: Config for a GGUF loader.
GemmaLoader: NormalLoader for a Gemma model.
Hanzo: The Hanzo struct handles sending requests to multiple engines. It is the core multi-threaded component of hanzo, and uses mpsc Sender and Receiver primitives to send and receive requests to the appropriate engine based on model ID.
HanzoBuilder: The HanzoBuilder takes the pipeline and a scheduler method and constructs an Engine and a Hanzo instance. The Engine runs on a separate thread, and the Hanzo instance stays on the calling thread.
HanzoConfig
HfConnectivityInfo
Idefics2Loader: MultimodalLoader for an Idefics 2 Vision model.
ImageChoice
ImageGenerationResponse
IntervalLogger
LLaVALoader: MultimodalLoader for an LLaVA Vision model.
LLaVANextLoader: MultimodalLoader for an LLaVANext Vision model.
LayerDeviceMapper: A device mapper which does device mapping per hidden layer.
LayerTopology
LlamaLoader: NormalLoader for a Llama model.
LoaderBuilder: A builder for a loader using the selected model.
LocalModelPaths: All local paths and metadata necessary to load a model.
Logprobs: Logprobs per token.
LoraAdapterPaths
McpClient: MCP client that manages connections to multiple MCP servers
McpClientConfig: Configuration for MCP client integration
McpServerConfig: Configuration for an individual MCP server
McpToolInfo: Information about a tool discovered from an MCP server
MemoryInfo
MemoryUsage
MistralLoader
MixtralLoader
Modalities
ModelGenerationDefaults: Optional generation defaults parsed from a model’s generation_config.json.
ModelLoaderConfig: Configuration for recreating a model loader when reloading an unloaded model. This captures the essential parameters needed to reconstruct a loader.
MultimodalLoader: A loader for a multimodal (non-quantized) model.
MultimodalLoaderBuilder: A builder for a loader for a multimodal (non-quantized) model.
MultimodalSpecificConfig: Config specific to loading a multimodal model.
NormalLoader: A loader for a “normal” (non-quantized) model.
NormalLoaderBuilder: A builder for a loader for a “normal” (non-quantized) model.
NormalRequest: A normal request request to the Hanzo.
NormalSpecificConfig: Config specific to loading a normal model.
Ordering: Adapter model ordering information.
PagedAttentionConfig: All memory counts in MB. Default for block size is 32.
Phi2Loader: NormalLoader for a Phi 2 model.
Phi3Loader: NormalLoader for a Phi 3 model.
Phi3VLoader: MultimodalLoader for a Phi 3 Vision model.
Qwen2Loader: NormalLoader for a Qwen 2 model.
ResponseLogprob: A logprob with the top logprobs for this token.
ResponseMessage: Chat completion response message.
SamplingParams: Sampling params are used to control sampling.
SandboxPolicy: Policy applied to a sandboxed process.
SearchFunctionParameters
SearchResult
SerializedSession: Wire format. Images and video frames are base64 PNGs.
SerializedVideo
SpeechLoader
SpeechPipeline
Starcoder2Loader: NormalLoader for a Starcoder2 model.
SystemInfo
TokenizationRequest: Request to tokenize some messages or some text.
Tool: Tool definition
ToolCallContext: Context provided to tool callbacks by the agentic loop.
ToolCallResponse
ToolCallbackWithTool: A tool callback with its associated Tool definition.
TopLogprob: Top-n logprobs element
Topology
TuneCandidate: A tuning candidate with all calculated metrics
UnloadedModelState: State preserved when a model is unloaded. This contains all the information needed to reload the model on demand.
Usage: OpenAI compatible (superset) usage during a request.
VideoInput: Decoded video input: a sequence of frames with metadata for timestamp generation.
WebSearchOptions

Enums§

AdapterPaths
AgentPermission
AgentToolApprovalHandler
AgentToolKind
AgentToolSource
AgenticToolCallData: Tool-specific structured progress data for agentic tool calls.
AgenticToolCallPhase: Phase of an agentic tool call.
AnyMoeExpertType
AutoDeviceMapParams
CodeExecutionPermission
Constraint: Control the constraint with llguidance.
DefaultSchedulerMethod: The scheduler method controld how sequences are scheduled during each step of the engine. For each scheduling step, the scheduler method is used if there are not only running, only waiting sequences, or none. If is it used, then it is used to allow waiting sequences to run.
DeviceMapSetting
DiffusionLoaderType: The architecture to load the diffusion model as.
DoctorStatus
EmbeddingLoaderType: The architecture to load the embedding model as.
EngineInstruction
FitStatus: Fit status for a quantization candidate
GGUFArchitecture
HanzoError
ImageGenerationResponseFormat: Image generation response format
IsqBits: Target bit width for automatic ISQ quantization.
IsqOrganization
IsqType: In-situ quantization type specifying the format to apply to model weights.
McpServerSource: Supported MCP server transport sources
MemoryGpuConfig
ModelCategory: Category of the model. This can also be used to extract model-category specific tools, such as the multimodal model prompt prefixer.
ModelDType: DType for the model.
ModelKind: The kind of model to build.
ModelSelected
ModelStatus: Model status for loaded/unloaded state
MultimodalLoaderType: The architecture to load the multimodal model as.
NetworkMode: Network access permitted to sandboxed processes.
NormalLoaderType: The architecture to load the normal model as.
PagedCacheType
QualityTier: Quality tier for a quantization level
ReasoningEffort: Reasoning effort level for models that support it (e.g., GPT-OSS with Harmony format). Controls the depth of reasoning/analysis in the model’s response.
Request: A request to the Engine, encapsulating the various parameters as well as the mpsc response Sender used to return the Response.
RequestMessage: Message or messages for a Request.
Response: The response enum contains 3 types of variants:
ResponseErr
ResponseOk
SchedulerConfig
SearchContextSize
SearchEmbeddingModel: Embedding model used for ranking web search results internally.
SpeechGenerationConfig
SpeechLoaderType
StopTokens: Stop sequences or ids.
SupportedModality
TokenSource: The source of the HF token.
ToolCallType: The type of a tool call (currently only function calls).
ToolCallbackKind: Wraps either a text-only or multimodal tool callback.
ToolChoice
ToolOutput: Tool output: text-only or multimodal.
ToolType: Type of tool
TuneProfile
WebSearchUserLocation

Constants§

DEFAULT_MAX_TOOL_ROUNDS: Default cap on tool-use rounds when the request doesn’t set one.
GGUF_MULTI_FILE_DELIMITER
HANZO_GIT_REVISION
HF_HUB_OFFLINE_ENV: Env variable that, when set to a truthy value, disables all network calls to the Hugging Face Hub. Only cached files are used.
MULTI_LORA_DELIMITER
SYSTEM_FINGERPRINT
UQFF_MULTI_FILE_DELIMITER

Statics§

ENGINE_INSTRUCTIONS: Engine instructions, per Engine (Hanzo) ID.
GLOBAL_HF_CACHE
TERMINATE_ALL_NEXT_STEP: Terminate all sequences on the next scheduling step. Be sure to reset this. This is a global flag for terminating all engines at once (e.g., Ctrl+C).

Traits§

CustomLogitsProcessor: Customizable logits processor.
Loader: The Loader trait abstracts the loading process. The primary entrypoint is the load_model method.
ModelPaths: ModelPaths abstracts the mechanism to get all necessary files for running a model. For example LocalModelPaths implements ModelPaths when all files are in the local file system.
MultimodalPromptPrefixer: Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at 0.
Pipeline
TryIntoDType: Type which can be converted to a DType

Functions§

auto_tune
check_hf_gated_access: Check HuggingFace connectivity and token validity by accessing a gated model
collect_system_info
expand_isq_value: Expand an ISQ specifier into concrete IsqType variants. Numeric shorthands (2-8) produce both the non-Metal and Metal variants; explicit method names resolve to a single variant.
get_auto_device_map_params
get_engine_terminate_flag: Get or create a termination flag for the current engine thread.
get_model_dtype
get_tgt_non_granular_index
get_toml_selected_model_device_map_params
get_toml_selected_model_dtype
hf_home_dir: Resolve the Hugging Face home directory.
hf_hub_cache_dir: Resolve the Hugging Face Hub cache directory.
hf_token_path: Resolve the Hugging Face token file path.
initialize_logging: This should be called to initialize the debug flag and logging. This should not be called in hanzo-engine code due to Rust usage.
is_hf_hub_offline: Returns true when the user has requested fully-offline operation via HF_HUB_OFFLINE. Accepted truthy values: 1, true, yes, on (case-insensitive). Anything else, or unset, is treated as online.
paged_attn_supported: true if built with CUDA (requires Unix) /Metal
parse_isq_value: Parse ISQ value.
parse_uqff_shard: Given a UQFF filename like "q4k-0.uqff", returns Some(("q4k", 0)). Returns None for non-sharded filenames like "model.uqff" where the suffix after the last - is not a number.
probe_hf_repo_files: Best-effort file listing for a HF repo. Returns None on 404, API failure, or offline-without-cache. Quiet by design: callers choose what to log.
reset_engine_terminate_flag: Reset termination flags for the current engine.
resolve_uqff_shorthand: Resolve a UQFF shorthand (numeric like "8" or ISQ name like "q4k") to an actual UQFF filename from the available files list.
run_doctor
sample_frame_indices: Sample num_frames frame indices uniformly from a video with total_frames frames.
should_terminate_engine_sequences: Check if the current engine should terminate sequences.
using_flash_attn: true if built with the flash-attn or flash-attn-v3 features, false otherwise.

Type Aliases§

AgentToolApprovalAsyncCallback
AgentToolApprovalCallback
AgentToolApprovalFuture
AgentToolApprovalNotifier
CodeExecutionApprovalCallback
CodeExecutionApprovalNotifier
LlguidanceGrammar
MessageContent
MultimodalToolCallback: Callback that can return multimodal output (text + images).
SearchCallback: Callback used to override how search results are gathered. The returned vector must be sorted in decreasing order of relevance.
ToolCallback: Custom tool callback. Receives the called function and returns the tool output as a string.
ToolCallbacks: Collection of callbacks keyed by tool name.

Crate hanzo_engine

Crate hanzo_engine Copy item path

Re-exports§

Modules§

Structs§

Enums§

Constants§

Statics§

Traits§

Functions§

Type Aliases§

Crate hanzo_engine