pub struct Repository<ObjectID: FsVerityHashValue> { /* private fields */ }Expand description
A content-addressable repository for composefs objects.
Stores content-addressed objects, splitstreams, and images with fsverity verification. Objects are stored by their fsverity digest, streams by SHA256 content hash, and both support named references for persistence across garbage collection.
Implementations§
Source§impl<ObjectID: FsVerityHashValue> Repository<ObjectID>
impl<ObjectID: FsVerityHashValue> Repository<ObjectID>
Sourcepub fn objects_dir(&self) -> ErrnoResult<&OwnedFd>
pub fn objects_dir(&self) -> ErrnoResult<&OwnedFd>
Return the objects directory.
Sourcepub fn set_write_concurrency(&mut self, n: usize)
pub fn set_write_concurrency(&mut self, n: usize)
Override the maximum number of concurrent object writes.
Must be called before the first use of write_semaphore;
has no effect if the semaphore has already been initialized.
Sourcepub fn write_semaphore(&self) -> Arc<Semaphore> ⓘ
pub fn write_semaphore(&self) -> Arc<Semaphore> ⓘ
Return a shared semaphore for limiting concurrent object writes.
This semaphore is lazily initialized with available_parallelism() permits
(or the value set via set_write_concurrency),
and shared across all operations on this repository. Use this to limit
concurrent I/O when processing multiple files or layers in parallel.
Sourcepub fn init_path(
dirfd: impl AsFd,
path: impl AsRef<Path>,
config: RepositoryConfig,
) -> Result<(Self, bool)>
pub fn init_path( dirfd: impl AsFd, path: impl AsRef<Path>, config: RepositoryConfig, ) -> Result<(Self, bool)>
Initialize a new repository at the target path and open it.
Creates the directory (mode 0700) if it does not exist, writes
meta.json using the parameters from config, and returns the opened
repository together with a flag indicating whether this was a
fresh initialization (true) or an idempotent open of an
existing repository with the same algorithm (false).
The config.algorithm must be compatible with this repository’s
ObjectID type (e.g. Algorithm::Sha512 for
Repository<Sha512HashValue>).
Unless config has been made insecure via RepositoryConfig::set_insecure,
fs-verity is enabled on meta.json, signaling that all objects must also have verity.
If meta.json already exists with a different algorithm, an
error is returned.
Sourcepub fn open_path(
dirfd: impl AsFd,
path: impl AsRef<Path>,
) -> Result<Self, RepositoryOpenError>
pub fn open_path( dirfd: impl AsFd, path: impl AsRef<Path>, ) -> Result<Self, RepositoryOpenError>
Open a repository at the target directory and path.
meta.json is read, parsed, and validated against this
repository’s ObjectID type. Parsing or compatibility errors
are propagated immediately so that broken metadata is never
silently ignored.
The repository’s security mode is auto-detected: if meta.json
has fs-verity enabled the repo requires verity on all objects
(secure mode). Otherwise the repository operates in insecure
mode. Use [set_insecure] to override after opening.
Sourcepub fn open_upgrade(
dirfd: impl AsFd,
path: impl AsRef<Path>,
) -> Result<(Self, bool)>
pub fn open_upgrade( dirfd: impl AsFd, path: impl AsRef<Path>, ) -> Result<(Self, bool)>
Open a repository, upgrading old-format repos that lack meta.json.
This method first tries open_path. If that fails
with OldFormatRepository,
it infers the algorithm and verity mode from existing objects,
writes meta.json, and retries the open.
This is the non-destructive upgrade path for repositories created
by composefs-rs versions that predated meta.json.
Returns (repo, upgraded) where upgraded is true if meta.json
was written.
Sourcepub fn open_system() -> Result<Self>
pub fn open_system() -> Result<Self>
Open the default system-global composefs repository.
Sourcepub async fn ensure_object_async(
self: &Arc<Self>,
data: Vec<u8>,
) -> Result<ObjectID>
pub async fn ensure_object_async( self: &Arc<Self>, data: Vec<u8>, ) -> Result<ObjectID>
Asynchronously ensures an object exists in the repository.
Same as ensure_object but runs the operation on a blocking thread pool
to avoid blocking async tasks. Returns the fsverity digest of the object.
For performance reasons, this function does not call fsync() or similar. After you’re
done with everything, call Repository::sync_async().
Sourcepub fn create_object_tmpfile(&self) -> Result<OwnedFd>
pub fn create_object_tmpfile(&self) -> Result<OwnedFd>
Create an O_TMPFILE in the objects directory for streaming writes.
Returns the file descriptor for writing. The caller should write data to this fd,
then call finalize_object_tmpfile to compute
the verity digest, enable fs-verity, and link the file into the objects directory.
Sourcepub fn ensure_object_from_file(
&self,
src: &File,
size: u64,
ctx: &mut ImportContext,
) -> Result<(ObjectID, ObjectStoreMethod)>
pub fn ensure_object_from_file( &self, src: &File, size: u64, ctx: &mut ImportContext, ) -> Result<(ObjectID, ObjectStoreMethod)>
Ensure an object exists by reflinking or hardlinking from a source file.
The fallback chain is: reflink -> hardlink -> copy.
- Reflink (FICLONE): zero-copy clone on btrfs/XFS. Uses a tmpfile.
- Hardlink: enables fs-verity on the source file in-place, then hardlinks it directly into the objects directory. This avoids all data copying on filesystems like ext4 that don’t support reflinks.
- Copy: regular data copy into a tmpfile as last resort.
The ctx argument accumulates knowledge across calls in the same
import operation. After the first reflink attempt fails with
EOPNOTSUPP / EXDEV, the context records this so that subsequent
calls skip straight to the hardlink path.
This is particularly useful for importing from containers-storage where we already have the file on disk and want to avoid copying data.
Sourcepub fn ensure_object_from_file_zerocopy(
&self,
src: &File,
size: u64,
ctx: &mut ImportContext,
) -> Result<(ObjectID, ObjectStoreMethod)>
pub fn ensure_object_from_file_zerocopy( &self, src: &File, size: u64, ctx: &mut ImportContext, ) -> Result<(ObjectID, ObjectStoreMethod)>
Like ensure_object_from_file but
errors if neither reflink nor hardlink succeeds, instead of falling back
to a regular copy.
Intended for bootc’s unified storage path where the composefs repo and containers-storage are always on the same filesystem, so zero-copy should always be possible.
Sourcepub fn finalize_object_tmpfile(
&self,
file: File,
size: u64,
) -> Result<(ObjectID, ObjectStoreMethod)>
pub fn finalize_object_tmpfile( &self, file: File, size: u64, ) -> Result<(ObjectID, ObjectStoreMethod)>
Finalize a tmpfile as an object.
This method should be called from a blocking context (e.g., spawn_blocking)
as it performs synchronous I/O operations.
This method:
- Re-opens the file as read-only
- Enables fs-verity on the file (kernel computes digest)
- Reads the digest from the kernel
- Checks if object already exists (deduplication)
- Links the file into the objects directory
By letting the kernel compute the digest during verity enable, we avoid reading the file an extra time in userspace.
Sourcepub fn ensure_object(&self, data: &[u8]) -> Result<ObjectID>
pub fn ensure_object(&self, data: &[u8]) -> Result<ObjectID>
Given a blob of data, store it in the repository.
For performance reasons, this function does not call fsync() or similar. After you’re
done with everything, call Repository::sync().
Sourcepub fn is_insecure(&self) -> bool
pub fn is_insecure(&self) -> bool
Returns whether the repository is in insecure mode.
This is auto-detected from whether meta.json has fs-verity
enabled, but can be overridden with [set_insecure].
Sourcepub fn set_erofs_version(&mut self, version: FormatVersion) -> &mut Self
pub fn set_erofs_version(&mut self, version: FormatVersion) -> &mut Self
Override the EROFS format version for this repository session.
Changes the in-memory default used by [FileSystem::commit_image]
and [FileSystem::compute_image_id] for the lifetime of this
Override the EROFS format version for this Repository instance only.
Does not rewrite meta.json. Intended for CLI tools that accept a
per-invocation --erofs-version flag to override the repository’s stored default.
Sourcepub fn set_insecure(&mut self) -> &mut Self
pub fn set_insecure(&mut self) -> &mut Self
Mark this repository as insecure, disabling verification of fs-verity digests. This allows operation on filesystems without verity support.
Sourcepub fn require_verity(&self) -> Result<()>
pub fn require_verity(&self) -> Result<()>
Require that this repository has fs-verity enabled.
Returns an error if the repository was not initialized with
verity on meta.json, since there is no mechanism to
retroactively enable verity on existing objects.
Sourcepub fn ensure_writable(&self) -> Result<()>
pub fn ensure_writable(&self) -> Result<()>
Fast pre-flight check that the repository is writable.
Uses faccessat(W_OK) to catch read-only mounts and permission
issues before starting expensive network or I/O work. Callers
that want to fail early (e.g. before downloading an image) should
call this; individual write methods already check internally.
Sourcepub fn create_stream(
self: &Arc<Self>,
content_type: u64,
) -> Result<SplitStreamWriter<ObjectID>>
pub fn create_stream( self: &Arc<Self>, content_type: u64, ) -> Result<SplitStreamWriter<ObjectID>>
Creates a SplitStreamWriter for writing a split stream. You should write the data to the returned object and then pass it to .store_stream() to store the result.
The writable check is performed here so that callers cannot obtain
a writer without first verifying the repository is writable.
The [WritableRepo] token is carried by the writer so that
subsequent object writes skip redundant checks.
Sourcepub fn has_stream(&self, content_identifier: &str) -> Result<Option<ObjectID>>
pub fn has_stream(&self, content_identifier: &str) -> Result<Option<ObjectID>>
Check if the provided splitstream is present in the repository; if so, return its fsverity digest.
Sourcepub fn write_stream(
&self,
writer: SplitStreamWriter<ObjectID>,
content_identifier: &str,
reference: Option<&str>,
) -> Result<ObjectID>
pub fn write_stream( &self, writer: SplitStreamWriter<ObjectID>, content_identifier: &str, reference: Option<&str>, ) -> Result<ObjectID>
Write the given splitstream to the repository with the provided content identifier and optional reference name.
This call contains an internal barrier that guarantees that, in event of a crash, either:
- the named stream (by
content_identifier) will not be available; or - the stream and all of its linked data will be available
In other words: it will not be possible to boot a system which contained a stream named
content_identifier but is missing linked streams or objects from that stream.
Sourcepub async fn register_stream(
self: &Arc<Self>,
object_id: &ObjectID,
content_identifier: &str,
reference: Option<&str>,
) -> Result<()>
pub async fn register_stream( self: &Arc<Self>, object_id: &ObjectID, content_identifier: &str, reference: Option<&str>, ) -> Result<()>
Register an already-stored object as a named stream.
This is useful when using SplitStreamBuilder which stores the splitstream
directly via finish(). After calling finish(), call this method to
sync all data to disk and create the stream symlink.
This method ensures atomicity: the stream symlink is only created after all objects have been synced to disk.
Sourcepub async fn write_stream_async(
self: &Arc<Self>,
writer: SplitStreamWriter<ObjectID>,
content_identifier: &str,
reference: Option<&str>,
) -> Result<ObjectID>
pub async fn write_stream_async( self: &Arc<Self>, writer: SplitStreamWriter<ObjectID>, content_identifier: &str, reference: Option<&str>, ) -> Result<ObjectID>
Async version of write_stream for use with parallel object storage.
This method awaits any pending parallel object storage tasks before
finalizing the stream. Use this when you’ve called write_external_parallel()
on the writer.
Sourcepub fn has_named_stream(&self, name: &str) -> Result<bool>
pub fn has_named_stream(&self, name: &str) -> Result<bool>
Check if a splitstream with a given name exists in the “refs” in the repository.
Sourcepub fn name_stream(&self, content_identifier: &str, name: &str) -> Result<()>
pub fn name_stream(&self, content_identifier: &str, name: &str) -> Result<()>
Assign a named reference to a stream, making it a GC root.
Creates a symlink at streams/refs/{name} pointing to the stream identified
by content_identifier. The stream must already exist in the repository.
Named references serve two purposes:
- They provide human-readable names for streams
- They act as GC roots - streams reachable from refs are not garbage collected
The name can include path separators to organize refs hierarchically
(e.g., myapp/layer1), and intermediate directories are created automatically.
Sourcepub fn ensure_stream<T: Default>(
self: &Arc<Self>,
content_identifier: &str,
content_type: u64,
callback: impl FnOnce(&mut SplitStreamWriter<ObjectID>) -> Result<T>,
reference: Option<&str>,
) -> Result<(ObjectID, T)>
pub fn ensure_stream<T: Default>( self: &Arc<Self>, content_identifier: &str, content_type: u64, callback: impl FnOnce(&mut SplitStreamWriter<ObjectID>) -> Result<T>, reference: Option<&str>, ) -> Result<(ObjectID, T)>
Ensures that the stream with a given content identifier digest exists in the repository.
This tries to find the stream by the content identifier. If the stream is already in the
repository, the object ID (fs-verity digest) is read from the symlink. If the stream is
not already in the repository, a SplitStreamWriter is created and passed to callback.
On return, the object ID of the stream will be calculated and it will be written to disk
(if it wasn’t already created by someone else in the meantime).
In both cases, if reference is provided, it is used to provide a fixed name for the
object. Any object that doesn’t have a fixed reference to it is subject to garbage
collection. It is an error if this reference already exists.
On success, the object ID of the new object is returned. It is expected that this object ID will be used when referring to the stream from other linked streams.
Sourcepub fn open_stream(
&self,
content_identifier: &str,
verity: Option<&ObjectID>,
expected_content_type: Option<u64>,
) -> Result<SplitStreamReader<ObjectID>>
pub fn open_stream( &self, content_identifier: &str, verity: Option<&ObjectID>, expected_content_type: Option<u64>, ) -> Result<SplitStreamReader<ObjectID>>
Open a splitstream with the given name.
Sourcepub fn open_object(&self, id: &ObjectID) -> Result<OwnedFd>
pub fn open_object(&self, id: &ObjectID) -> Result<OwnedFd>
Given an object identifier (a digest), return a read-only file descriptor
for its contents. The fsverity digest is verified (if the repository is not in insecure mode).
Sourcepub fn read_object(&self, id: &ObjectID) -> Result<Vec<u8>>
pub fn read_object(&self, id: &ObjectID) -> Result<Vec<u8>>
Read the contents of an object into a Vec
Sourcepub fn merge_splitstream(
&self,
content_identifier: &str,
verity: Option<&ObjectID>,
expected_content_type: Option<u64>,
output: &mut impl Write,
) -> Result<()>
pub fn merge_splitstream( &self, content_identifier: &str, verity: Option<&ObjectID>, expected_content_type: Option<u64>, output: &mut impl Write, ) -> Result<()>
Merges a splitstream into a single continuous stream.
Opens the named splitstream, resolves all object references, and writes the complete merged content to the provided writer. Optionally verifies the splitstream’s fsverity digest matches the expected value.
Sourcepub fn write_image(&self, name: Option<&str>, data: &[u8]) -> Result<ObjectID>
pub fn write_image(&self, name: Option<&str>, data: &[u8]) -> Result<ObjectID>
Write data into the repository as an image with the given name`.
The fsverity digest is returned.
§Integrity
This function is not safe for untrusted users.
Sourcepub fn import_image<R: Read>(
&self,
name: &str,
image: &mut R,
) -> Result<ObjectID>
pub fn import_image<R: Read>( &self, name: &str, image: &mut R, ) -> Result<ObjectID>
Import the data from the provided read into the repository as an image.
The fsverity digest is returned.
§Integrity
This function is not safe for untrusted users.
Sourcepub fn open_image(&self, name: &str) -> Result<(OwnedFd, bool)>
pub fn open_image(&self, name: &str) -> Result<(OwnedFd, bool)>
Returns the fd of the image and whether or not verity should be enabled when mounting it.
Sourcepub fn mount_with_options(
&self,
name: &str,
options: &MountOptions,
) -> Result<OwnedFd>
pub fn mount_with_options( &self, name: &str, options: &MountOptions, ) -> Result<OwnedFd>
Create a detached mount of an image. This file descriptor can then
be attached via e.g. move_mount.
Sourcepub fn mount(&self, name: &str) -> Result<OwnedFd>
pub fn mount(&self, name: &str) -> Result<OwnedFd>
Create a detached read-only mount of an image.
This file descriptor can then be attached via e.g. move_mount.
Sourcepub fn mount_at(
&self,
name: &str,
mountpoint: impl AsRef<Path>,
options: &MountOptions,
) -> Result<()>
pub fn mount_at( &self, name: &str, mountpoint: impl AsRef<Path>, options: &MountOptions, ) -> Result<()>
Mount the image with the provided digest at the target path.
Sourcepub fn symlink(
&self,
name: impl AsRef<Path> + Debug,
target: impl AsRef<Path> + Debug,
) -> Result<()>
pub fn symlink( &self, name: impl AsRef<Path> + Debug, target: impl AsRef<Path> + Debug, ) -> Result<()>
Creates a relative symlink within the repository.
Computes the correct relative path from the symlink location to the target, creating any necessary intermediate directories. Atomically replaces any existing symlink at the specified name.
Sourcepub fn objects_for_image(&self, name: &str) -> Result<HashSet<ObjectID>>
pub fn objects_for_image(&self, name: &str) -> Result<HashSet<ObjectID>>
Given an image, return the set of all objects referenced by it.
Sourcepub fn sync(&self) -> Result<()>
pub fn sync(&self) -> Result<()>
Makes sure all content is written to the repository.
This is currently just syncfs() on the repository’s root directory because we don’t have any better options at present. This blocks until the data is written out.
Sourcepub async fn sync_async(self: &Arc<Self>) -> Result<()>
pub async fn sync_async(self: &Arc<Self>) -> Result<()>
Makes sure all content is written to the repository.
This is currently just syncfs() on the repository’s root directory because we don’t have any better options at present. This won’t return until the data is written out.
Sourcepub fn gc(&self, additional_roots: &[&str]) -> Result<GcResult>
pub fn gc(&self, additional_roots: &[&str]) -> Result<GcResult>
Perform garbage collection, removing unreferenced objects.
Objects reachable from images/refs/ or streams/refs/ are preserved,
plus any additional_roots (looked up in both images and streams).
Returns statistics about what was removed.
§Locking
An exclusive lock is held for the duration of this operation.
Sourcepub fn gc_dry_run(&self, additional_roots: &[&str]) -> Result<GcResult>
pub fn gc_dry_run(&self, additional_roots: &[&str]) -> Result<GcResult>
Sourcepub async fn fsck(&self) -> Result<FsckResult>
pub async fn fsck(&self) -> Result<FsckResult>
Check the structural integrity of the repository.
Walks all objects, streams, and images in the repository, verifying:
- Object fsverity digests match their path-derived identifiers
- Stream and image symlinks resolve to existing objects
- Stream/image refs resolve to valid entries
- Splitstreams have valid headers and reference only existing objects
Object directories are checked in parallel using spawn_blocking,
with concurrency bounded by available_parallelism().
Returns a FsckResult summarizing the findings. Does not modify
any repository contents.
Sourcepub async fn fsck_metadata_only(&self) -> Result<FsckResult>
pub async fn fsck_metadata_only(&self) -> Result<FsckResult>
Run a metadata-only consistency check.
This validates meta.json and the stream/image symlinks (including
splitstream structure and referenced-object existence) but skips the
expensive per-object fs-verity digest verification done by fsck.
Useful for a fast structural check on large repositories.
Sourcepub fn repo_fd(&self) -> BorrowedFd<'_>
pub fn repo_fd(&self) -> BorrowedFd<'_>
Returns a borrowed file descriptor for the repository root.
This allows low-level operations on the repository directory.
Sourcepub fn metadata(&self) -> &RepoMetadata
pub fn metadata(&self) -> &RepoMetadata
Return the repository metadata parsed from meta.json at open time.
The metadata was already validated against this repository’s
ObjectID type when the repository was opened, so no further
compatibility check is needed.
Sourcepub fn erofs_version(&self) -> FormatVersion
pub fn erofs_version(&self) -> FormatVersion
Returns the effective EROFS format version for this repository.
Returns the per-invocation override set by set_erofs_version
if one is active, otherwise returns the default version from the stored
FormatConfig (see format_config).
Sourcepub fn format_config(&self) -> FormatConfig
pub fn format_config(&self) -> FormatConfig
Returns the effective FormatConfig for this repository.
When a per-invocation version override is active (set via
set_erofs_version), returns a single-version
config for that override — the override narrows generation to exactly one
format, discarding any extra versions from meta.json.
Otherwise returns the full config from meta.json, including any extra
versions. For repositories created before the erofs_formats field was
added, the config is derived from the legacy "v1_erofs" ro_compat flag.
Sourcepub fn default_format_config(&self) -> FormatConfig
pub fn default_format_config(&self) -> FormatConfig
Returns the primary FormatConfig configured for this repository.
Alias for format_config.