Module utils

Expand description

Utility functions and abstractions.

Modules§

fp8: FP8 (float8_e4m3fn) dequantization support.
hf: HuggingFace Hub integration for automatic model downloading.
models: Scan local directories for cached/downloaded models and report their status.
split: Model splitting utility — creates per-worker model bundles from a full model.

get_inference_device: Returns the best available device at ordinal index (in case of multiple GPUs), or CPU if force_cpu is true.
load_safetensors_from_model
load_safetensors_paths_from_index: Load the safetensors files for a model from the hub based on a json index file.
load_var_builder_for_local_layers: Create a VarBuilder that only loads safetensors shards needed for the given local layers. Shards containing only remote-worker tensors are excluded, reducing GPU memory usage on the master.
load_var_builder_for_specific_layers: Create a VarBuilder that only loads safetensors shards containing tensors for the given layer prefixes. Workers use this to skip shards that only contain layers assigned to other nodes.
load_var_builder_from_index: Create a VarBuilder with the tensors loaded from the index.