pub struct LegacyMultiStorageSource { /* private fields */ }Expand description
Legacy multi-storage source for old PyTorch format (pre-1.6)
§Format Analysis
Based on research into PyTorch’s serialization.py and the legacy TAR format:
-
Storage Layout: PyTorch legacy format (0.1.10-1.5) stores data as:
- Pickle metadata containing tensor definitions
- A list of storage keys in order
- Raw binary data with all storages concatenated
-
Boundary Detection Challenge: After extensive research, I found that:
- PyTorch does NOT store explicit storage boundaries in the file
- Storages are concatenated in the order specified by the storage keys list
- Each tensor references its storage by key and specifies offset/size
-
Why True Lazy Loading is Difficult:
- To determine storage boundaries, we would need to: a. Parse ALL tensor metadata to find which storage each uses b. Track the maximum extent of each storage based on tensor usage c. Infer boundaries from the gaps between storages
- However, the TensorSnapshot abstraction hides storage keys in closures
- This would require deep modifications to the pickle parsing logic
§Current Implementation
This implementation provides a best-effort approach:
- Supports setting a storage map if boundaries can be determined externally
- Falls back to loading the entire blob if boundaries are unknown
Implementations§
Source§impl LegacyMultiStorageSource
impl LegacyMultiStorageSource
Sourcepub fn new(path: PathBuf, data_offset: u64, data_size: u64) -> Self
pub fn new(path: PathBuf, data_offset: u64, data_size: u64) -> Self
Create a new legacy multi-storage source
Sourcepub fn set_storage_keys(&self, keys: Vec<String>)
pub fn set_storage_keys(&self, keys: Vec<String>)
Set the ordered storage keys from the pickle
Sourcepub fn track_storage_usage(&self, storage_key: &str, offset: usize, size: usize)
pub fn track_storage_usage(&self, storage_key: &str, offset: usize, size: usize)
Track storage usage from tensor access This is called from within tensor loading closures
Auto Trait Implementations§
impl !Freeze for LegacyMultiStorageSource
impl RefUnwindSafe for LegacyMultiStorageSource
impl Send for LegacyMultiStorageSource
impl Sync for LegacyMultiStorageSource
impl Unpin for LegacyMultiStorageSource
impl UnwindSafe for LegacyMultiStorageSource
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more