Expand description
Utility functions and abstractions.
Modules§
- fp8
- FP8 (float8_e4m3fn) dequantization support.
- hf
- HuggingFace Hub integration for automatic model downloading.
- models
- Scan local directories for cached/downloaded models and report their status.
- split
- Model splitting utility — creates per-worker model bundles from a full model.
Functions§
- get_
inference_ device - Returns the best available device at
ordinalindex (in case of multiple GPUs), or CPU ifforce_cpuis true. - load_
safetensors_ from_ model - load_
safetensors_ paths_ from_ index - Load the safetensors files for a model from the hub based on a json index file.
- load_
var_ builder_ for_ local_ layers - Create a VarBuilder that only loads safetensors shards needed for the given local layers. Shards containing only remote-worker tensors are excluded, reducing GPU memory usage on the master.
- load_
var_ builder_ for_ specific_ layers - Create a VarBuilder that only loads safetensors shards containing tensors for the given layer prefixes. Workers use this to skip shards that only contain layers assigned to other nodes.
- load_
var_ builder_ from_ index - Create a VarBuilder with the tensors loaded from the index.