Skip to main content

Repository

Struct Repository 

Source
pub struct Repository<ObjectID: FsVerityHashValue> { /* private fields */ }
Expand description

A content-addressable repository for composefs objects.

Stores content-addressed objects, splitstreams, and images with fsverity verification. Objects are stored by their fsverity digest, streams by SHA256 content hash, and both support named references for persistence across garbage collection.

Implementations§

Source§

impl<ObjectID: FsVerityHashValue> Repository<ObjectID>

Source

pub fn objects_dir(&self) -> ErrnoResult<&OwnedFd>

Return the objects directory.

Source

pub fn set_write_concurrency(&mut self, n: usize)

Override the maximum number of concurrent object writes.

Must be called before the first use of write_semaphore; has no effect if the semaphore has already been initialized.

Source

pub fn write_semaphore(&self) -> Arc<Semaphore>

Return a shared semaphore for limiting concurrent object writes.

This semaphore is lazily initialized with available_parallelism() permits (or the value set via set_write_concurrency), and shared across all operations on this repository. Use this to limit concurrent I/O when processing multiple files or layers in parallel.

Source

pub fn init_path( dirfd: impl AsFd, path: impl AsRef<Path>, config: RepositoryConfig, ) -> Result<(Self, bool)>

Initialize a new repository at the target path and open it.

Creates the directory (mode 0700) if it does not exist, writes meta.json using the parameters from config, and returns the opened repository together with a flag indicating whether this was a fresh initialization (true) or an idempotent open of an existing repository with the same algorithm (false).

The config.algorithm must be compatible with this repository’s ObjectID type (e.g. Algorithm::Sha512 for Repository<Sha512HashValue>).

Unless config has been made insecure via RepositoryConfig::set_insecure, fs-verity is enabled on meta.json, signaling that all objects must also have verity.

If meta.json already exists with a different algorithm, an error is returned.

Source

pub fn open_path( dirfd: impl AsFd, path: impl AsRef<Path>, ) -> Result<Self, RepositoryOpenError>

Open a repository at the target directory and path.

meta.json is read, parsed, and validated against this repository’s ObjectID type. Parsing or compatibility errors are propagated immediately so that broken metadata is never silently ignored.

The repository’s security mode is auto-detected: if meta.json has fs-verity enabled the repo requires verity on all objects (secure mode). Otherwise the repository operates in insecure mode. Use [set_insecure] to override after opening.

Source

pub fn open_upgrade( dirfd: impl AsFd, path: impl AsRef<Path>, ) -> Result<(Self, bool)>

Open a repository, upgrading old-format repos that lack meta.json.

This method first tries open_path. If that fails with OldFormatRepository, it infers the algorithm and verity mode from existing objects, writes meta.json, and retries the open.

This is the non-destructive upgrade path for repositories created by composefs-rs versions that predated meta.json.

Returns (repo, upgraded) where upgraded is true if meta.json was written.

Source

pub fn open_user() -> Result<Self>

Open the default user-owned composefs repository.

Source

pub fn open_system() -> Result<Self>

Open the default system-global composefs repository.

Source

pub async fn ensure_object_async( self: &Arc<Self>, data: Vec<u8>, ) -> Result<ObjectID>

Asynchronously ensures an object exists in the repository.

Same as ensure_object but runs the operation on a blocking thread pool to avoid blocking async tasks. Returns the fsverity digest of the object.

For performance reasons, this function does not call fsync() or similar. After you’re done with everything, call Repository::sync_async().

Source

pub fn create_object_tmpfile(&self) -> Result<OwnedFd>

Create an O_TMPFILE in the objects directory for streaming writes.

Returns the file descriptor for writing. The caller should write data to this fd, then call finalize_object_tmpfile to compute the verity digest, enable fs-verity, and link the file into the objects directory.

Source

pub fn ensure_object_from_file( &self, src: &File, size: u64, ctx: &mut ImportContext, ) -> Result<(ObjectID, ObjectStoreMethod)>

Ensure an object exists by reflinking or hardlinking from a source file.

The fallback chain is: reflink -> hardlink -> copy.

  • Reflink (FICLONE): zero-copy clone on btrfs/XFS. Uses a tmpfile.
  • Hardlink: enables fs-verity on the source file in-place, then hardlinks it directly into the objects directory. This avoids all data copying on filesystems like ext4 that don’t support reflinks.
  • Copy: regular data copy into a tmpfile as last resort.

The ctx argument accumulates knowledge across calls in the same import operation. After the first reflink attempt fails with EOPNOTSUPP / EXDEV, the context records this so that subsequent calls skip straight to the hardlink path.

This is particularly useful for importing from containers-storage where we already have the file on disk and want to avoid copying data.

Source

pub fn ensure_object_from_file_zerocopy( &self, src: &File, size: u64, ctx: &mut ImportContext, ) -> Result<(ObjectID, ObjectStoreMethod)>

Like ensure_object_from_file but errors if neither reflink nor hardlink succeeds, instead of falling back to a regular copy.

Intended for bootc’s unified storage path where the composefs repo and containers-storage are always on the same filesystem, so zero-copy should always be possible.

Source

pub fn finalize_object_tmpfile( &self, file: File, size: u64, ) -> Result<(ObjectID, ObjectStoreMethod)>

Finalize a tmpfile as an object.

This method should be called from a blocking context (e.g., spawn_blocking) as it performs synchronous I/O operations.

This method:

  1. Re-opens the file as read-only
  2. Enables fs-verity on the file (kernel computes digest)
  3. Reads the digest from the kernel
  4. Checks if object already exists (deduplication)
  5. Links the file into the objects directory

By letting the kernel compute the digest during verity enable, we avoid reading the file an extra time in userspace.

Source

pub fn ensure_object(&self, data: &[u8]) -> Result<ObjectID>

Given a blob of data, store it in the repository.

For performance reasons, this function does not call fsync() or similar. After you’re done with everything, call Repository::sync().

Source

pub fn is_insecure(&self) -> bool

Returns whether the repository is in insecure mode.

This is auto-detected from whether meta.json has fs-verity enabled, but can be overridden with [set_insecure].

Source

pub fn set_erofs_version(&mut self, version: FormatVersion) -> &mut Self

Override the EROFS format version for this repository session.

Changes the in-memory default used by [FileSystem::commit_image] and [FileSystem::compute_image_id] for the lifetime of this Override the EROFS format version for this Repository instance only.

Does not rewrite meta.json. Intended for CLI tools that accept a per-invocation --erofs-version flag to override the repository’s stored default.

Source

pub fn set_insecure(&mut self) -> &mut Self

Mark this repository as insecure, disabling verification of fs-verity digests. This allows operation on filesystems without verity support.

Source

pub fn require_verity(&self) -> Result<()>

Require that this repository has fs-verity enabled.

Returns an error if the repository was not initialized with verity on meta.json, since there is no mechanism to retroactively enable verity on existing objects.

Source

pub fn ensure_writable(&self) -> Result<()>

Fast pre-flight check that the repository is writable.

Uses faccessat(W_OK) to catch read-only mounts and permission issues before starting expensive network or I/O work. Callers that want to fail early (e.g. before downloading an image) should call this; individual write methods already check internally.

Source

pub fn create_stream( self: &Arc<Self>, content_type: u64, ) -> Result<SplitStreamWriter<ObjectID>>

Creates a SplitStreamWriter for writing a split stream. You should write the data to the returned object and then pass it to .store_stream() to store the result.

The writable check is performed here so that callers cannot obtain a writer without first verifying the repository is writable. The [WritableRepo] token is carried by the writer so that subsequent object writes skip redundant checks.

Source

pub fn has_stream(&self, content_identifier: &str) -> Result<Option<ObjectID>>

Check if the provided splitstream is present in the repository; if so, return its fsverity digest.

Source

pub fn write_stream( &self, writer: SplitStreamWriter<ObjectID>, content_identifier: &str, reference: Option<&str>, ) -> Result<ObjectID>

Write the given splitstream to the repository with the provided content identifier and optional reference name.

This call contains an internal barrier that guarantees that, in event of a crash, either:

  • the named stream (by content_identifier) will not be available; or
  • the stream and all of its linked data will be available

In other words: it will not be possible to boot a system which contained a stream named content_identifier but is missing linked streams or objects from that stream.

Source

pub async fn register_stream( self: &Arc<Self>, object_id: &ObjectID, content_identifier: &str, reference: Option<&str>, ) -> Result<()>

Register an already-stored object as a named stream.

This is useful when using SplitStreamBuilder which stores the splitstream directly via finish(). After calling finish(), call this method to sync all data to disk and create the stream symlink.

This method ensures atomicity: the stream symlink is only created after all objects have been synced to disk.

Source

pub async fn write_stream_async( self: &Arc<Self>, writer: SplitStreamWriter<ObjectID>, content_identifier: &str, reference: Option<&str>, ) -> Result<ObjectID>

Async version of write_stream for use with parallel object storage.

This method awaits any pending parallel object storage tasks before finalizing the stream. Use this when you’ve called write_external_parallel() on the writer.

Source

pub fn has_named_stream(&self, name: &str) -> Result<bool>

Check if a splitstream with a given name exists in the “refs” in the repository.

Source

pub fn name_stream(&self, content_identifier: &str, name: &str) -> Result<()>

Assign a named reference to a stream, making it a GC root.

Creates a symlink at streams/refs/{name} pointing to the stream identified by content_identifier. The stream must already exist in the repository.

Named references serve two purposes:

  1. They provide human-readable names for streams
  2. They act as GC roots - streams reachable from refs are not garbage collected

The name can include path separators to organize refs hierarchically (e.g., myapp/layer1), and intermediate directories are created automatically.

Source

pub fn ensure_stream<T: Default>( self: &Arc<Self>, content_identifier: &str, content_type: u64, callback: impl FnOnce(&mut SplitStreamWriter<ObjectID>) -> Result<T>, reference: Option<&str>, ) -> Result<(ObjectID, T)>

Ensures that the stream with a given content identifier digest exists in the repository.

This tries to find the stream by the content identifier. If the stream is already in the repository, the object ID (fs-verity digest) is read from the symlink. If the stream is not already in the repository, a SplitStreamWriter is created and passed to callback. On return, the object ID of the stream will be calculated and it will be written to disk (if it wasn’t already created by someone else in the meantime).

In both cases, if reference is provided, it is used to provide a fixed name for the object. Any object that doesn’t have a fixed reference to it is subject to garbage collection. It is an error if this reference already exists.

On success, the object ID of the new object is returned. It is expected that this object ID will be used when referring to the stream from other linked streams.

Source

pub fn open_stream( &self, content_identifier: &str, verity: Option<&ObjectID>, expected_content_type: Option<u64>, ) -> Result<SplitStreamReader<ObjectID>>

Open a splitstream with the given name.

Source

pub fn open_object(&self, id: &ObjectID) -> Result<OwnedFd>

Given an object identifier (a digest), return a read-only file descriptor for its contents. The fsverity digest is verified (if the repository is not in insecure mode).

Source

pub fn read_object(&self, id: &ObjectID) -> Result<Vec<u8>>

Read the contents of an object into a Vec

Source

pub fn merge_splitstream( &self, content_identifier: &str, verity: Option<&ObjectID>, expected_content_type: Option<u64>, output: &mut impl Write, ) -> Result<()>

Merges a splitstream into a single continuous stream.

Opens the named splitstream, resolves all object references, and writes the complete merged content to the provided writer. Optionally verifies the splitstream’s fsverity digest matches the expected value.

Source

pub fn write_image(&self, name: Option<&str>, data: &[u8]) -> Result<ObjectID>

Write data into the repository as an image with the given name`.

The fsverity digest is returned.

§Integrity

This function is not safe for untrusted users.

Source

pub fn import_image<R: Read>( &self, name: &str, image: &mut R, ) -> Result<ObjectID>

Import the data from the provided read into the repository as an image.

The fsverity digest is returned.

§Integrity

This function is not safe for untrusted users.

Source

pub fn open_image(&self, name: &str) -> Result<(OwnedFd, bool)>

Returns the fd of the image and whether or not verity should be enabled when mounting it.

Source

pub fn mount_with_options( &self, name: &str, options: &MountOptions, ) -> Result<OwnedFd>

Create a detached mount of an image. This file descriptor can then be attached via e.g. move_mount.

Source

pub fn mount(&self, name: &str) -> Result<OwnedFd>

Create a detached read-only mount of an image. This file descriptor can then be attached via e.g. move_mount.

Source

pub fn mount_at( &self, name: &str, mountpoint: impl AsRef<Path>, options: &MountOptions, ) -> Result<()>

Mount the image with the provided digest at the target path.

Creates a relative symlink within the repository.

Computes the correct relative path from the symlink location to the target, creating any necessary intermediate directories. Atomically replaces any existing symlink at the specified name.

Source

pub fn objects_for_image(&self, name: &str) -> Result<HashSet<ObjectID>>

Given an image, return the set of all objects referenced by it.

Source

pub fn sync(&self) -> Result<()>

Makes sure all content is written to the repository.

This is currently just syncfs() on the repository’s root directory because we don’t have any better options at present. This blocks until the data is written out.

Source

pub async fn sync_async(self: &Arc<Self>) -> Result<()>

Makes sure all content is written to the repository.

This is currently just syncfs() on the repository’s root directory because we don’t have any better options at present. This won’t return until the data is written out.

Source

pub fn gc(&self, additional_roots: &[&str]) -> Result<GcResult>

Perform garbage collection, removing unreferenced objects.

Objects reachable from images/refs/ or streams/refs/ are preserved, plus any additional_roots (looked up in both images and streams). Returns statistics about what was removed.

§Locking

An exclusive lock is held for the duration of this operation.

Source

pub fn gc_dry_run(&self, additional_roots: &[&str]) -> Result<GcResult>

Preview what garbage collection would remove, without deleting.

Returns the same statistics that gc would return, but no files are actually deleted.

§Locking

A shared lock is held for the duration of this operation (readers are not blocked).

Source

pub async fn fsck(&self) -> Result<FsckResult>

Check the structural integrity of the repository.

Walks all objects, streams, and images in the repository, verifying:

  • Object fsverity digests match their path-derived identifiers
  • Stream and image symlinks resolve to existing objects
  • Stream/image refs resolve to valid entries
  • Splitstreams have valid headers and reference only existing objects

Object directories are checked in parallel using spawn_blocking, with concurrency bounded by available_parallelism().

Returns a FsckResult summarizing the findings. Does not modify any repository contents.

Source

pub async fn fsck_metadata_only(&self) -> Result<FsckResult>

Run a metadata-only consistency check.

This validates meta.json and the stream/image symlinks (including splitstream structure and referenced-object existence) but skips the expensive per-object fs-verity digest verification done by fsck. Useful for a fast structural check on large repositories.

Source

pub fn repo_fd(&self) -> BorrowedFd<'_>

Returns a borrowed file descriptor for the repository root.

This allows low-level operations on the repository directory.

Source

pub fn metadata(&self) -> &RepoMetadata

Return the repository metadata parsed from meta.json at open time.

The metadata was already validated against this repository’s ObjectID type when the repository was opened, so no further compatibility check is needed.

Source

pub fn erofs_version(&self) -> FormatVersion

Returns the effective EROFS format version for this repository.

Returns the per-invocation override set by set_erofs_version if one is active, otherwise returns the default version from the stored FormatConfig (see format_config).

Source

pub fn format_config(&self) -> FormatConfig

Returns the effective FormatConfig for this repository.

When a per-invocation version override is active (set via set_erofs_version), returns a single-version config for that override — the override narrows generation to exactly one format, discarding any extra versions from meta.json.

Otherwise returns the full config from meta.json, including any extra versions. For repositories created before the erofs_formats field was added, the config is derived from the legacy "v1_erofs" ro_compat flag.

Source

pub fn default_format_config(&self) -> FormatConfig

Returns the primary FormatConfig configured for this repository.

Alias for format_config.

Source

pub fn list_stream_refs(&self, prefix: &str) -> Result<Vec<(String, String)>>

Lists all named stream references under a given prefix.

Returns (name, target) pairs where name is relative to the prefix.

Trait Implementations§

Source§

impl<ObjectID: FsVerityHashValue> Debug for Repository<ObjectID>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<ObjectID: FsVerityHashValue> Drop for Repository<ObjectID>

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more
Source§

impl<ObjectID: FsVerityHashValue> ObjectStore<ObjectID> for Repository<ObjectID>

Source§

fn ensure_object_from_fd(&self, fd: OwnedFd, size: u64) -> Result<ObjectID>

Store fd as an object, returning its verity digest. Read more
Source§

fn write_semaphore(&self) -> Arc<Semaphore>

Return a semaphore that gates concurrent object writes.

Auto Trait Implementations§

§

impl<ObjectID> !Freeze for Repository<ObjectID>

§

impl<ObjectID> RefUnwindSafe for Repository<ObjectID>
where ObjectID: RefUnwindSafe,

§

impl<ObjectID> Send for Repository<ObjectID>

§

impl<ObjectID> Sync for Repository<ObjectID>

§

impl<ObjectID> Unpin for Repository<ObjectID>

§

impl<ObjectID> UnsafeUnpin for Repository<ObjectID>

§

impl<ObjectID> UnwindSafe for Repository<ObjectID>
where ObjectID: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.