Crate omnibor

Source
Expand description

OmniBOR is a specification for a reproducible software identifier we call an “Artifact ID” plus a compact record of build inputs called an “Input Manifest”. Together, they let anyone precisely identify software binaries and text files, and track precise inputs used to build them. Together, they form a Merkle Tree, so any change in a dependency causes all artifacts built from it to have a new, different Artifact ID, and anyone can use the Input Manifests to detect exactly what dependency changed.

This crate exposes APIs for producing and consuming both Artifact IDs and Input Manifests.

If you just want documentation for the API of this crate, check out these two types:

We also provide a CLI, based on this crate, as another option for working with Artifact IDs and Input Manifests.

§Table of Contents

  1. Examples
  2. Crate Overview
    1. Creating Artifact IDs
    2. Creating Input Manifests
    3. Hash Algorithms and Hash Providers
    4. Storing Input Manifests
  3. Foreign Function Interface
  4. Specification Compliance
  5. Comparison with Other Software Identifiers

§Examples

The Artifact ID of one of the test files in this repo, hello_world.txt, is:

gitoid:blob:sha256:fee53a18d32820613c0527aa79be5cb30173c823a9b448fa4817767cc84c6f03

An input manifest for a .o file with three inputs in C might look like:

gitoid:blob:sha256
09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772
230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61
2f4a51b16b76bbc87c4c27af8ae062b1b50b280f1ab78e3eec155334588dc88e

§Crate Overview

This crate is built around two central types, ArtifactId and InputManifest. The rest of the crate is in service of producing, consuming, and storing these types.

§Creating Artifact IDs

ArtifactIds are created with an ArtifactIdBuilder. You can get a builder with either ArtifactId::builder or a constructor on ArtifactIdBuilder directly. There are convenience constructors for each of the three built-in HashProviders:

Artifact IDs can be made from many different kinds of input types, and both synchronously and asynchronously. The following builder methods are available:

§Creating Input Manifests

InputManifests are created with an InputManifestBuilder. This type is parameterized over three things: the hash algorithm to use, the hash provider to use, and the storage to use. The usual flow of constructing an InputManifest is to create a new InputManifestBuilder, add entries with InputManifestBuilder::add_relation, and complete the build with InputManifestBuilder::finish.

§Hash Algorithms and Hash Providers

Artifact IDs and Input Manifests are based on a chosen hash algorithm. Today, OmniBOR only supports SHA-256, though alternatives may be added in the future if SHA-256’s cryptographic properties are broken.

The HashAlgorithm trait is implemented by the Sha256 type, and both ArtifactIds and InputManifests are parameterized over their HashAlgorithm.

We also support plugging in arbitrary “hash providers,” libraries which provide implementations of cryptographic hashes. Today, we support RustCrypto, OpenSSL, and BoringSSL.

All hash providers are represented by a type implementing the HashProvider trait; so we have RustCrypto, OpenSsl, and BoringSsl, respectively. All APIs in the crate for creating ArtifactIds or InputManifests are parameterized over the HashProvider. We also provide convenience methods to choose one of the built-in providers.

Providers are conditionally compiled in based on crate features. By default, only the backend-rustcrypto feature is turned on. Any combination of these may be included. In all cases, they are vendored in and do not link to any system instances of these libraries.

In the future we plan to support linking to system instances of OpenSSL and BoringSSL.

§Storing Input Manifests

We expose a Storage trait, representing the abstract interface needed for interacting with Input Manifests in memory or on-disk. There are two types, FileSystemStorage and InMemoryStorage, that implement it.

If you want to persist Input Manifests in any way, we recommend using FileSystemStorage, as it correctly complies with the OmniBOR specification’s requirements for where manifests should be stored.

If you do not need to persist Input Manifests, use InMemoryStorage.

§Foreign Function Interface

This crate experimentally exposes a Foreign Function Interface (FFI), to make it usable from languages besides Rust. Today this only includes working with Artifact IDs when using the RustCrypto hash provider. This interface is unstable, though we plan to grow it to cover the complete API surface of the crate, including all hash providers and arbitrary other hash providers, and to become stable.

§Specification Compliance

OmniBOR is a draft specification, and this implementation is the primary implementation of it.

Currently, this implementation follows the draft 0.2 version of the specification, which includes two major differences relative to version 0.1:

  • Limitation of supported hashes to SHA-256 only, and
  • Universal newline normalization.

All differences from the specification and this library are in the process of being resolved through specification updates.

§Comparison with other Software Identifiers

OmniBOR Artifact IDs are just one scheme for identifying software. Others include the Common Platform Enumeration (CPE), Package URLs (pURLs), Software Hash IDs (SWHIDs), Software ID tags (SWID tags), and Nix Derivation Store Paths.

Each of these has their own strengths and weaknesses, and the creation of each was motivated by a different purpose. In many cases, these schemes can be complementary to each other. The following table tries to break down some major points of comparison between them:

SchemeDerivationArchitectureDefined ByBased On
CPEDefinedCentralizedNIST-
pURLDefinedFederatedECMA + Package Hosts-
SWIDDefinedDistributedSoftware Producer-
SWHIDInherentDistributed-Artifact content
NixInherentDistributed-Package build inputs
OmniBORInherentDistributed-Artifact content and build inputs

Let’s explain this a bit:

  1. Derivation: Whether an identifier comes from an authority who defines it, or can be derived from the thing being identified inherently.
  2. Architecture: What level of authority delegation exists for producing the identifier. For example, CPE relies on a central dictionary only NIST can edit, so it is “centralized,” while pURL has a central list of “types” listing package hosts, but the names of packages on those hosts are controlled separately, so it’s “federated”. All inherent schemes are considered “distributed”.
  3. Defined By: Who has authority to produce the identifiers. CPE’s dictionary is controlled by NIST. pURLs list of types is standardized under ECMA’s TC54 (a Technical Committee on “Software and System Transparency”) but each package host controls its own namespace of names. SWID tags are provided by the producer of the software. All inherent identifiers do not require a “definer”.
  4. Based On: What materials are used to produce the identifier. This is not relevant for defined identifiers. For inherent identifiers, SWHID uses a hash of a file (it also has variant types of identifiers for things like directories, but all are content-based); Nix derives its Derivation Store Path from the inputs to a package’s build; OmniBOR derives Artifact IDs from an artifact’s contents, which may embed a reference to the Artifact ID of the file’s input manifest and thus also depend on the identities of its build dependencies.

In 2023, CISA, the Cybersecurity and Infrastructure Security Agency, published a report titled “Software Identification Ecosystem Option Analysis” that surveyed the state of the software identification ecosystem and made recommendations for which schemes to prefer, when to consider them, and the challenges facing each of them. OmniBOR was one of three schemes recommended by this report, alongside CPE and pURL.

We recommend using OmniBOR as a complement to defined identifiers like CPE or pURL, with CPE or pURL identifying the relevant product or package and OmniBOR identifying specific software artifacts.

We recommend using OmniBOR instead of other inherent identifiers, unless you are in an ecosystem which already uses an alternative, for the following reasons:

  • Length Extension Protection: OmniBOR uses the “Git Object Identifier” scheme used by the Git Version Control System (VCS), which includes the length of the artifact as an input to the hash. This helps protect against attempts to engineer hash collisions by requiring attackers to manage the influence of changing the length of an artifact.
  • Use of SHA-256: OmniBOR only supports SHA-256 today, while Software Hash IDs use SHA-1 and Nix Derivation Store Paths support MD5, SHA-1, SHA-256, or SHA-512. Nix also truncates its hashes, which OmniBOR does not do.
  • Inclusion of both artifact contents and build inputs: OmniBOR Artifact IDs are based on an artifact’s contents, and if the artifact has embedded the Artifact ID of its Input Manifest, that Input Manifest (and by extension all build inputs) influences the resulting Artifact ID. This makes an Artifact ID the strongest commitment out of the inherent identifiers. SWHIDs only incorporate the artifact itself; Nix Derivation Store Paths are based only on build inputs (the Nix system tries to enforce reproducibility in practice, though reproducibility from the same inputs is not guaranteed, and will not be detectable by Derivation Store Path alone).

Modules§

hash_algorithm
Hash algorithms supported for Artifact IDs.
hash_provider
Cryptography libraries providing hash function implementations.
storage
How manifests are stored and accessed.

Structs§

ArtifactId
A universally reproducible software identifier.
ArtifactIdBuilder
A builder for ArtifactIds.
InputManifest
A manifest describing the inputs used to build an artifact.
InputManifestBuilder
A builder for InputManifests.
InputManifestRelation
A single row in an InputManifest.

Enums§

EmbeddingMode
Indicate whether to embed the identifier for an input manifest in an artifact.
Error
Represents any errors from the omnibor crate.