xet-data 1.5.1

Data processing pipeline for chunking, deduplication, and file reconstruction; used in the Hugging Face Xet client tools. Intended to be used through the API in the hf-xet package.
Documentation

xet-data

crates.io docs.rs License

Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.

Overview

  • Content-defined chunking — Gear-hash based chunking for deduplication
  • Deduplication — Probe and register chunks against metadata shards
  • File reconstruction — Reassemble files from deduplicated chunk references
  • Progress tracking — Hooks for upload/download progress reporting

This crate is part of xet-core.

License

Apache-2.0