Skip to main content

Module utils

Module utils 

Source
Expand description

Utility functions and abstractions.

Modules§

fp8
FP8 (float8_e4m3fn) dequantization support.
hf
HuggingFace Hub integration for automatic model downloading.
models
Scan local directories for cached/downloaded models and report their status.
split
Model splitting utility — creates per-worker model bundles from a full model.

Functions§

get_inference_device
Returns the best available device at ordinal index (in case of multiple GPUs), or CPU if force_cpu is true.
load_safetensors_from_model
load_safetensors_paths_from_index
Load the safetensors files for a model from the hub based on a json index file.
load_var_builder_for_local_layers
Create a VarBuilder that only loads safetensors shards needed for the given local layers. Shards containing only remote-worker tensors are excluded, reducing GPU memory usage on the master.
load_var_builder_for_specific_layers
Create a VarBuilder that only loads safetensors shards containing tensors for the given layer prefixes. Workers use this to skip shards that only contain layers assigned to other nodes.
load_var_builder_from_index
Create a VarBuilder with the tensors loaded from the index.