Expand description
Implementation of Lock-Free Deduplication in Rust.
This crates provides an implementation of the algorithm used by tools such as Duplicacy as described in their paper.
§Overview
The most important object is the repository, which stores chunks, clients
and manifests and is available to a number of clients.
It is represented by the Repository
trait, which is designed to be
implemented by you.
Clients represent individual users which may perform operations on the
repository at the same time as other clients, like creating chunks or
manifests.
A single client can only perform some operations like manifest creation
sequentially, while multiple clients can perform them in parallel.
Furthermore, clients have to be registered with the repository before
they perform any operations and should be removed when they won’t create
new manifests for a long time.
Manifests represent data uploaded by a client which has been divided into chunks and stored in the repository. Manifests only store references to their chunks and may share them with other manifests, which requires periodic chunk removals after some manifests have been deleted.
Chunks are pieces of data uploaded to the repository and may be referenced by manifests. They allow for deduplicating manifest contents by associating them with an id based on their content, causing chunks with the same content to only be stored once in the repository.
Chunk creation can be done in parallel (even by a single client), but chunk
removal is more complicated because other clients can be in the process of
creating manifests referencing them, which would cause invalid references
should these chunks be deleted.
To prevent this chunks should be first turned into a special type of chunk
called “fossils” by using FossilCollectionBuilder
, which produces a
FossilCollection
.
This collection can be deleted when every client has created a manifest
after the fossil collection was created, which will either permanently
delete fossils or turn them back into chunks.
It is possible to combine the deletion of a fossil collection with the
creation of the next one using PipelinedFossilCollectionBuilder
.
§Concurrency
While this crate does not depend on a particular async runtime it performs
actions concurrently when possible, which can be controlled by the concurrency
argument of some functions.
This is achieved by using futures::stream::FuturesUnordered
, which in
turn requires that any blocking operations (for example filesystem operations
or hashing large amounts of data) are queued on an external thread pool should
parallel execution be desired.
§Features
serde
: implementsserde::Serialize
andserde::Deserialize
forFossilCollection
.
Structs§
- Builder
Seen Manifests - Iterator produced by
FossilCollectionBuilder::iter_seen_manifests
. - Collection
Seen Manifests - Iterator produced by
FossilCollection::iter_seen_manifests
. - Fossil
Candidates - Iterator produced by
FossilCollectionBuilder::iter_fossil_candidates
. - Fossil
Collection - A set of fossils await either recovery or deletion.
- Fossil
Collection Builder - Builder for creating fossil collections.
- Fossil
Collection Error - Error of a failed fossil collection operation.
- Fossils
- Iterator produced by
FossilCollection::iter_fossils
. - Pipelined
Fossil Collection Builder - Combine fossil deletion with fossil collection.
- Pipelined
Fossil Deletion Error - Error of a failed call to
PipelinedFossilCollectionBuilder::delete
. - Pipelined
Seen Manifests - Iterator produced by
PipelinedFossilCollectionBuilder::iter_seen_manifests
. - Referenced
Chunks - Iterator produced by
FossilCollectionBuilder::iter_referenced_chunks
.
Enums§
- Fallible
Fossil Collection Error - Error produced by methods of
RepositoryExt
which may fail before collecting fossils. - Fossil
Deletion Error - Error of a failed fossil deletion operation.
Traits§
- Manifest
- Data uploaded by a client, represented as a set of chunks and metadata.
- Repository
- A repository storing clients, chunks and manifests.
- Repository
Ext - Shortscuts for using
FossilCollectionBuilder
.