clast 1.0.0

A Rust library for Content-Defined Chunking (CDC).
Documentation
<div align="center">

<img src="docs/banner.png" alt="Clast Banner"/>

<br/>

A high-performance, extensible **Content-Defined Chunking (CDC)** library for Rust.

<br/>

English │ [한국어](docs/README-ko_kr.md)

</div>

<br/>

## Overview

**Clast** is a modular library designed for **Content-Defined Chunking (CDC)**. It splits data into variable-sized chunks based on content, rather than fixed offsets (Fixed-Size Chunking, FSC), making it a critical building block for data deduplication, backup systems, and efficient storage solutions.

**Clast** is architected to support multiple chunking algorithms, allowing developers to choose the best strategy for their specific use cases.

<br/>

## Key Features

- **High Performance**: Optimized for throughput and low CPU overhead.
- **Modular Architecture**: Designed to support various CDC algorithms.
- **Async & Sync**: Support for both synchronous `std::io` and asynchronous `tokio` runtimes.

<br/>

## Supported Algorithms

### FastCDC
An implementation of the **FastCDC** algorithm as described in *[The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems](https://doi.org/10.1109/TPDS.2020.2984632)*.

It incorporates five key optimizations:
- Gear-based Rolling Hashing
- Optimized Hash Judgment
- Sub-minimum Chunk Cut-Point Skipping
- Normalized Chunking
- Rolling Two Bytes

<br/>

## Installation

Use `cargo` to install the package:

```bash
cargo add clast
```


### Feature Flags

**Clast** uses feature flags to minimize the compiled binary size. You can selectively enable the features you need.

- `fastcdc`: Enables the FastCDC algorithm implementation. (Enabled by default)
- `async`: Enables asynchronous support using `tokio`.

Example of enabling only `fastcdc` (default behavior):

```bash
cargo add clast
```

Example of enabling `fastcdc` and `async` support:

```bash
cargo add clast --features async
```

Or in your `Cargo.toml`:

```toml
[dependencies]
clast = { version = "1.0.0", features = ["async"] }
```

<br/>

## Usage

Please refer to the [Tutorials](docs/tutorials/MENU.md) for detailed usage examples.

<br/>

## Reference

* **FastCDC**: Wen Xia et al., "The Design of Fast Content-Defined Chunking for Data Deduplication Based Storage Systems," *IEEE Transactions on Parallel and Distributed Systems*, 2020.

<br/>

## Contributing

Contributions are welcome! Please feel free to open a Pull Request.

1. Fork the repository.
2. Create your feature branch (`git checkout -b feature/new-feature`).
3. Commit your changes (`git commit -m 'Add some feature'`).
4. Push to the branch (`git push origin feature/new-feature`).
5. Open a Pull Request.

Please make sure to run `cargo fmt` and `cargo test` before opening a Pull Request.

<br/>

## License

MIT © Arcadia Softs. See [LICENSE](LICENSE) for details.