Skip to main content Crate hanzo_engine Copy item path Source pub use files::format_from_name ;pub use files::is_text_mime ;pub use files::mime_for_format ;pub use files::File ;pub use files::FileContent ;pub use files::FileSource ;pub use files::FileStore ;pub use files::RequestedFile ;pub use files::MODEL_INLINE_BYTES ;pub use files::WIRE_EMBED_LIMIT_BYTES ;pub use speculative::MtpConfig ;pub use speculative::SpeculativeConfig ;pub use llguidance ;disk_kv_cache distributed files Typed file outputs from agentic runs. layers matformer reasoning_parsers Unified reasoning/thinking parser framework. speculative speech_utils AddModelConfig Configuration for adding a model to Hanzo AgentToolApproval AgentToolApprovalDecision AgentToolApprovalRequest AgentToolMetadata AgenticSessionStore Agentic conversation state, keyed by session ID. Also supports content-based matching for clients that don’t pass an ID. AgenticToolCallRecord AnyMoeConfig AnyMoeLoader AnyMoePipeline ApproximateUserLocation AudioInput Raw audio input consisting of PCM samples and a sample rate. AutoLoader Automatically selects the appropriate loader based on repository/config metadata. AutoLoaderBuilder AutoTuneRequest AutoTuneResult BuildInfo CalledFunction Called function with name and arguments ChatCompletionChunkResponse Chat completion streaming request chunk. ChatCompletionResponse An OpenAI compatible chat completion response. ChatTemplate Template for chat models including bos/eos/unk as well as the chat template. Choice Chat completion choice. ChunkChoice Chat completion streaming chunk choice. CodeExecutionApproval CodeExecutionApprovalRequest CodeExecutionConfig Python code execution config. CompletionChoice Completion request choice. CompletionChunkChoice Chat completion streaming chunk choice. CompletionChunkResponse Completion request choice. CompletionResponse An OpenAI compatible completion response. CpuInfo Delta Delta in content for streaming response. DetokenizationRequest Request to detokenize some text. DeviceInfo DeviceLayerMapMetadata DeviceMapMetadata Metadata to initialize the device mapper. DiffusionGenerationParams DiffusionLoader A loader for a diffusion (non-quantized) model. DiffusionLoaderBuilder A builder for a loader for a diffusion (non-quantized) model. DoctorCheck DoctorReport DrySamplingParams Parameters for DRY (Don’t Repeat Yourself) sampling to reduce repetition. EmbeddingLoader A loader for an embedding (non-quantized) model. EmbeddingLoaderBuilder A builder for a loader for an embedding (non-quantized) model. EmbeddingModelPaths All local paths and metadata necessary to load an embedding model. EmbeddingSpecificConfig Config specific to loading an embedding model. EngineConfig Configuration for creating an engine instance Function Function definition for a tool GGMLLoader A loader for a GGML model. GGMLLoaderBuilder A builder for a GGML loader. GGMLSpecificConfig Config for a GGML loader. GGUFLoader Loader for a GGUF model. GGUFLoaderBuilder A builder for a GGUF loader. GGUFSpecificConfig Config for a GGUF loader. GemmaLoader NormalLoader for a Gemma model.Hanzo The Hanzo struct handles sending requests to multiple engines.
It is the core multi-threaded component of hanzo, and uses mpsc
Sender and Receiver primitives to send and receive requests to the
appropriate engine based on model ID. HanzoBuilder The HanzoBuilder takes the pipeline and a scheduler method and constructs
an Engine and a Hanzo instance. The Engine runs on a separate thread, and the Hanzo
instance stays on the calling thread. HanzoConfig HfConnectivityInfo Idefics2Loader MultimodalLoader for an Idefics 2 Vision model.ImageChoice ImageGenerationResponse IntervalLogger LLaVALoader MultimodalLoader for an LLaVA Vision model.LLaVANextLoader MultimodalLoader for an LLaVANext Vision model.LayerDeviceMapper A device mapper which does device mapping per hidden layer. LayerTopology LlamaLoader NormalLoader for a Llama model.LoaderBuilder A builder for a loader using the selected model. LocalModelPaths All local paths and metadata necessary to load a model. Logprobs Logprobs per token. LoraAdapterPaths McpClient MCP client that manages connections to multiple MCP servers McpClientConfig Configuration for MCP client integration McpServerConfig Configuration for an individual MCP server McpToolInfo Information about a tool discovered from an MCP server MemoryInfo MemoryUsage MistralLoader MixtralLoader Modalities ModelGenerationDefaults Optional generation defaults parsed from a model’s generation_config.json. ModelLoaderConfig Configuration for recreating a model loader when reloading an unloaded model.
This captures the essential parameters needed to reconstruct a loader. MultimodalLoader A loader for a multimodal (non-quantized) model. MultimodalLoaderBuilder A builder for a loader for a multimodal (non-quantized) model. MultimodalSpecificConfig Config specific to loading a multimodal model. NormalLoader A loader for a “normal” (non-quantized) model. NormalLoaderBuilder A builder for a loader for a “normal” (non-quantized) model. NormalRequest A normal request request to the Hanzo. NormalSpecificConfig Config specific to loading a normal model. Ordering Adapter model ordering information. PagedAttentionConfig All memory counts in MB. Default for block size is 32. Phi2Loader NormalLoader for a Phi 2 model.Phi3Loader NormalLoader for a Phi 3 model.Phi3VLoader MultimodalLoader for a Phi 3 Vision model.Qwen2Loader NormalLoader for a Qwen 2 model.ResponseLogprob A logprob with the top logprobs for this token. ResponseMessage Chat completion response message. SamplingParams Sampling params are used to control sampling. SandboxPolicy Policy applied to a sandboxed process. SearchFunctionParameters SearchResult SerializedSession Wire format. Images and video frames are base64 PNGs. SerializedVideo SpeechLoader SpeechPipeline Starcoder2Loader NormalLoader for a Starcoder2 model.SystemInfo TokenizationRequest Request to tokenize some messages or some text. Tool Tool definition ToolCallContext Context provided to tool callbacks by the agentic loop. ToolCallResponse ToolCallbackWithTool A tool callback with its associated Tool definition. TopLogprob Top-n logprobs element Topology TuneCandidate A tuning candidate with all calculated metrics UnloadedModelState State preserved when a model is unloaded.
This contains all the information needed to reload the model on demand. Usage OpenAI compatible (superset) usage during a request. VideoInput Decoded video input: a sequence of frames with metadata for timestamp generation. WebSearchOptions AdapterPaths AgentPermission AgentToolApprovalHandler AgentToolKind AgentToolSource AgenticToolCallData Tool-specific structured progress data for agentic tool calls. AgenticToolCallPhase Phase of an agentic tool call. AnyMoeExpertType AutoDeviceMapParams CodeExecutionPermission Constraint Control the constraint with llguidance. DefaultSchedulerMethod The scheduler method controld how sequences are scheduled during each
step of the engine. For each scheduling step, the scheduler method is used if there
are not only running, only waiting sequences, or none. If is it used, then it
is used to allow waiting sequences to run. DeviceMapSetting DiffusionLoaderType The architecture to load the diffusion model as. DoctorStatus EmbeddingLoaderType The architecture to load the embedding model as. EngineInstruction FitStatus Fit status for a quantization candidate GGUFArchitecture HanzoError ImageGenerationResponseFormat Image generation response format IsqBits Target bit width for automatic ISQ quantization. IsqOrganization IsqType In-situ quantization type specifying the format to apply to model weights. McpServerSource Supported MCP server transport sources MemoryGpuConfig ModelCategory Category of the model. This can also be used to extract model-category specific tools,
such as the multimodal model prompt prefixer. ModelDType DType for the model. ModelKind The kind of model to build. ModelSelected ModelStatus Model status for loaded/unloaded state MultimodalLoaderType The architecture to load the multimodal model as. NetworkMode Network access permitted to sandboxed processes. NormalLoaderType The architecture to load the normal model as. PagedCacheType QualityTier Quality tier for a quantization level ReasoningEffort Reasoning effort level for models that support it (e.g., GPT-OSS with Harmony format).
Controls the depth of reasoning/analysis in the model’s response. Request A request to the Engine, encapsulating the various parameters as well as
the mpsc response Sender used to return the Response . RequestMessage Message or messages for a Request . Response The response enum contains 3 types of variants: ResponseErr ResponseOk SchedulerConfig SearchContextSize SearchEmbeddingModel Embedding model used for ranking web search results internally. SpeechGenerationConfig SpeechLoaderType StopTokens Stop sequences or ids. SupportedModality TokenSource The source of the HF token. ToolCallType The type of a tool call (currently only function calls). ToolCallbackKind Wraps either a text-only or multimodal tool callback. ToolChoice ToolOutput Tool output: text-only or multimodal. ToolType Type of tool TuneProfile WebSearchUserLocation DEFAULT_MAX_TOOL_ROUNDS Default cap on tool-use rounds when the request doesn’t set one. GGUF_MULTI_FILE_DELIMITER HANZO_GIT_REVISION HF_HUB_OFFLINE_ENV Env variable that, when set to a truthy value, disables all network calls
to the Hugging Face Hub. Only cached files are used. MULTI_LORA_DELIMITER SYSTEM_FINGERPRINT UQFF_MULTI_FILE_DELIMITER ENGINE_INSTRUCTIONS Engine instructions, per Engine (Hanzo) ID. GLOBAL_HF_CACHE TERMINATE_ALL_NEXT_STEP Terminate all sequences on the next scheduling step. Be sure to reset this.
This is a global flag for terminating all engines at once (e.g., Ctrl+C). CustomLogitsProcessor Customizable logits processor. Loader The Loader trait abstracts the loading process. The primary entrypoint is the
load_model method. ModelPaths ModelPaths abstracts the mechanism to get all necessary files for running a model. For
example LocalModelPaths implements ModelPaths when all files are in the local file system.MultimodalPromptPrefixer Prepend a vision tag appropriate for the model to the prompt. Image indexing is assumed that start at 0. Pipeline TryIntoDType Type which can be converted to a DType auto_tune check_hf_gated_access Check HuggingFace connectivity and token validity by accessing a gated model collect_system_info expand_isq_value Expand an ISQ specifier into concrete IsqType variants.
Numeric shorthands (2-8) produce both the non-Metal and Metal variants;
explicit method names resolve to a single variant. get_auto_device_map_params get_engine_terminate_flag Get or create a termination flag for the current engine thread. get_model_dtype get_tgt_non_granular_index get_toml_selected_model_device_map_params get_toml_selected_model_dtype hf_home_dir Resolve the Hugging Face home directory. hf_hub_cache_dir Resolve the Hugging Face Hub cache directory. hf_token_path Resolve the Hugging Face token file path. initialize_logging This should be called to initialize the debug flag and logging.
This should not be called in hanzo-engine code due to Rust usage. is_hf_hub_offline Returns true when the user has requested fully-offline operation via
HF_HUB_OFFLINE. Accepted truthy values: 1, true, yes, on
(case-insensitive). Anything else, or unset, is treated as online. paged_attn_supported true if built with CUDA (requires Unix) /Metalparse_isq_value Parse ISQ value. parse_uqff_shard Given a UQFF filename like "q4k-0.uqff", returns Some(("q4k", 0)).
Returns None for non-sharded filenames like "model.uqff" where the
suffix after the last - is not a number. probe_hf_repo_files Best-effort file listing for a HF repo. Returns None on 404, API failure,
or offline-without-cache. Quiet by design: callers choose what to log. reset_engine_terminate_flag Reset termination flags for the current engine. resolve_uqff_shorthand Resolve a UQFF shorthand (numeric like "8" or ISQ name like "q4k") to an
actual UQFF filename from the available files list. run_doctor sample_frame_indices Sample num_frames frame indices uniformly from a video with total_frames frames. should_terminate_engine_sequences Check if the current engine should terminate sequences. using_flash_attn true if built with the flash-attn or flash-attn-v3 features, false otherwise.AgentToolApprovalAsyncCallback AgentToolApprovalCallback AgentToolApprovalFuture AgentToolApprovalNotifier CodeExecutionApprovalCallback CodeExecutionApprovalNotifier LlguidanceGrammar MessageContent MultimodalToolCallback Callback that can return multimodal output (text + images). SearchCallback Callback used to override how search results are gathered. The returned
vector must be sorted in decreasing order of relevance. ToolCallback Custom tool callback. Receives the called function and returns the tool output as a string. ToolCallbacks Collection of callbacks keyed by tool name.