# xet-data
[](https://crates.io/crates/xet-data)
[](https://docs.rs/xet-data)
[](https://github.com/huggingface/xet-core/blob/main/LICENSE)
Data processing pipeline for chunking, deduplication, and file reconstruction. Intended to be used through the API in the hf-xet package.
## Overview
- **Content-defined chunking** — Gear-hash based chunking for deduplication
- **Deduplication** — Probe and register chunks against metadata shards
- **File reconstruction** — Reassemble files from deduplicated chunk references
- **Progress tracking** — Hooks for upload/download progress reporting
This crate is part of [xet-core](https://github.com/huggingface/xet-core).
## License
Apache-2.0