xet-data
Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.
Overview
- Content-defined chunking — Gear-hash based chunking for deduplication
- Deduplication — Probe and register chunks against metadata shards
- File reconstruction — Reassemble files from deduplicated chunk references
- Progress tracking — Hooks for upload/download progress reporting
This crate is part of xet-core.
License
Apache-2.0