Expand description
OmniBOR is a specification for a reproducible software identifier we call an “Artifact ID” plus a compact record of build inputs called an “Input Manifest”. Together, they let anyone precisely identify software binaries and text files, and track precise inputs used to build them. Together, they form a Merkle Tree, so any change in a dependency causes all artifacts built from it to have a new, different Artifact ID, and anyone can use the Input Manifests to detect exactly what dependency changed.
This crate exposes APIs for producing and consuming both Artifact IDs and Input Manifests.
If you just want documentation for the API of this crate, check out these two types:
We also provide a CLI, based on this crate, as another option for working with Artifact IDs and Input Manifests.
§Table of Contents
- Examples
- Crate Overview
- Foreign Function Interface
- Specification Compliance
- Comparison with Other Software Identifiers
§Examples
The Artifact ID of one of the test files in this repo, hello_world.txt
, is:
gitoid:blob:sha256:fee53a18d32820613c0527aa79be5cb30173c823a9b448fa4817767cc84c6f03
An input manifest for a .o
file with three inputs in C might look like:
gitoid:blob:sha256
09c825ac02df9150e4f93d12ba1da5d1ff5846c3e62503c814aa3a300c535772
230f3515d1306690815bd9c3da0d15d8b6fcf43894d17100eb44b6d329a92f61
2f4a51b16b76bbc87c4c27af8ae062b1b50b280f1ab78e3eec155334588dc88e
§Crate Overview
This crate is built around two central types, ArtifactId
and InputManifest
. The rest of
the crate is in service of producing, consuming, and storing these types.
§Creating Artifact IDs
ArtifactId
s are created with an ArtifactIdBuilder
. You can get a
builder with either ArtifactId::builder
or a constructor on
ArtifactIdBuilder
directly. There are convenience constructors for each
of the three built-in HashProvider
s:
ArtifactIdBuilder::with_rustcrypto
: Build Artifact IDs with RustCrypto.ArtifactIdBuilder::with_boringssl
: Build Artifact IDs with BoringSSL.ArtifactIdBuilder::with_openssl
: Build Artifact IDs with OpenSSL.
Artifact IDs can be made from many different kinds of input types, and both synchronously and asynchronously. The following builder methods are available:
Method | Input Type | Sync or Async? |
---|---|---|
ArtifactIdBuilder::identify_bytes | &[u8] | Sync |
ArtifactIdBuilder::identify_string | &str | Sync |
ArtifactIdBuilder::identify_file | &mut File | Sync |
ArtifactIdBuilder::identify_path | &Path | Sync |
ArtifactIdBuilder::identify_reader | R: Read + Sync | Sync |
ArtifactIdBuilder::identify_async_file | &mut tokio::fs::File | Async |
ArtifactIdBuilder::identify_async_path | &Path | Async |
ArtifactIdBuilder::identify_async_reader | R: AsyncRead + AsyncSync + Unpin | Async |
ArtifactIdBuilder::identify_manifest | &InputManifest | Sync |
§Creating Input Manifests
InputManifest
s are created with an InputManifestBuilder
. This type
is parameterized over three things: the hash algorithm to use, the hash
provider to use, and the storage to use. The usual flow of constructing
an InputManifest
is to create a new InputManifestBuilder
, add
entries with InputManifestBuilder::add_relation
, and complete
the build with InputManifestBuilder::finish
.
§Hash Algorithms and Hash Providers
Artifact IDs and Input Manifests are based on a chosen hash algorithm. Today, OmniBOR only supports SHA-256, though alternatives may be added in the future if SHA-256’s cryptographic properties are broken.
The HashAlgorithm
trait is
implemented by the Sha256
type, and
both ArtifactId
s and InputManifest
s
are parameterized over their HashAlgorithm
.
We also support plugging in arbitrary “hash providers,” libraries which provide implementations of cryptographic hashes. Today, we support RustCrypto, OpenSSL, and BoringSSL.
All hash providers are represented by a type implementing the
HashProvider
trait; so we have
RustCrypto
,
OpenSsl
, and
BoringSsl
, respectively.
All APIs in the crate for creating ArtifactId
s or
InputManifest
s are
parameterized over the HashProvider
.
We also provide convenience methods to choose one of the built-in providers.
Providers are conditionally compiled in based on crate features. By default,
only the backend-rustcrypto
feature is turned on. Any combination of
these may be included. In all cases, they are vendored in and do not link
to any system instances of these libraries.
In the future we plan to support linking to system instances of OpenSSL and BoringSSL.
§Storing Input Manifests
We expose a Storage
trait, representing the
abstract interface needed for interacting with Input Manifests in memory or
on-disk. There are two types,
FileSystemStorage
and
InMemoryStorage
, that implement it.
If you want to persist Input Manifests in any way, we recommend using
FileSystemStorage
, as it correctly
complies with the OmniBOR specification’s requirements for where manifests
should be stored.
If you do not need to persist Input Manifests, use
InMemoryStorage
.
§Foreign Function Interface
This crate experimentally exposes a Foreign Function Interface (FFI), to make it usable from languages besides Rust. Today this only includes working with Artifact IDs when using the RustCrypto hash provider. This interface is unstable, though we plan to grow it to cover the complete API surface of the crate, including all hash providers and arbitrary other hash providers, and to become stable.
§Specification Compliance
OmniBOR is a draft specification, and this implementation is the primary implementation of it.
Currently, this implementation follows the draft 0.2 version of the specification, which includes two major differences relative to version 0.1:
- Limitation of supported hashes to SHA-256 only, and
- Universal newline normalization.
All differences from the specification and this library are in the process of being resolved through specification updates.
§Comparison with other Software Identifiers
OmniBOR Artifact IDs are just one scheme for identifying software. Others include the Common Platform Enumeration (CPE), Package URLs (pURLs), Software Hash IDs (SWHIDs), Software ID tags (SWID tags), and Nix Derivation Store Paths.
Each of these has their own strengths and weaknesses, and the creation of each was motivated by a different purpose. In many cases, these schemes can be complementary to each other. The following table tries to break down some major points of comparison between them:
Scheme | Derivation | Architecture | Defined By | Based On |
---|---|---|---|---|
CPE | Defined | Centralized | NIST | - |
pURL | Defined | Federated | ECMA + Package Hosts | - |
SWID | Defined | Distributed | Software Producer | - |
SWHID | Inherent | Distributed | - | Artifact content |
Nix | Inherent | Distributed | - | Package build inputs |
OmniBOR | Inherent | Distributed | - | Artifact content and build inputs |
Let’s explain this a bit:
- Derivation: Whether an identifier comes from an authority who defines it, or can be derived from the thing being identified inherently.
- Architecture: What level of authority delegation exists for producing the identifier. For example, CPE relies on a central dictionary only NIST can edit, so it is “centralized,” while pURL has a central list of “types” listing package hosts, but the names of packages on those hosts are controlled separately, so it’s “federated”. All inherent schemes are considered “distributed”.
- Defined By: Who has authority to produce the identifiers. CPE’s dictionary is controlled by NIST. pURLs list of types is standardized under ECMA’s TC54 (a Technical Committee on “Software and System Transparency”) but each package host controls its own namespace of names. SWID tags are provided by the producer of the software. All inherent identifiers do not require a “definer”.
- Based On: What materials are used to produce the identifier. This is not relevant for defined identifiers. For inherent identifiers, SWHID uses a hash of a file (it also has variant types of identifiers for things like directories, but all are content-based); Nix derives its Derivation Store Path from the inputs to a package’s build; OmniBOR derives Artifact IDs from an artifact’s contents, which may embed a reference to the Artifact ID of the file’s input manifest and thus also depend on the identities of its build dependencies.
In 2023, CISA, the Cybersecurity and Infrastructure Security Agency, published a report titled “Software Identification Ecosystem Option Analysis” that surveyed the state of the software identification ecosystem and made recommendations for which schemes to prefer, when to consider them, and the challenges facing each of them. OmniBOR was one of three schemes recommended by this report, alongside CPE and pURL.
We recommend using OmniBOR as a complement to defined identifiers like CPE or pURL, with CPE or pURL identifying the relevant product or package and OmniBOR identifying specific software artifacts.
We recommend using OmniBOR instead of other inherent identifiers, unless you are in an ecosystem which already uses an alternative, for the following reasons:
- Length Extension Protection: OmniBOR uses the “Git Object Identifier” scheme used by the Git Version Control System (VCS), which includes the length of the artifact as an input to the hash. This helps protect against attempts to engineer hash collisions by requiring attackers to manage the influence of changing the length of an artifact.
- Use of SHA-256: OmniBOR only supports SHA-256 today, while Software Hash IDs use SHA-1 and Nix Derivation Store Paths support MD5, SHA-1, SHA-256, or SHA-512. Nix also truncates its hashes, which OmniBOR does not do.
- Inclusion of both artifact contents and build inputs: OmniBOR Artifact IDs are based on an artifact’s contents, and if the artifact has embedded the Artifact ID of its Input Manifest, that Input Manifest (and by extension all build inputs) influences the resulting Artifact ID. This makes an Artifact ID the strongest commitment out of the inherent identifiers. SWHIDs only incorporate the artifact itself; Nix Derivation Store Paths are based only on build inputs (the Nix system tries to enforce reproducibility in practice, though reproducibility from the same inputs is not guaranteed, and will not be detectable by Derivation Store Path alone).
Modules§
- hash_
algorithm - Hash algorithms supported for Artifact IDs.
- hash_
provider - Cryptography libraries providing hash function implementations.
- storage
- How manifests are stored and accessed.
Structs§
- Artifact
Id - A universally reproducible software identifier.
- Artifact
IdBuilder - A builder for
ArtifactId
s. - Input
Manifest - A manifest describing the inputs used to build an artifact.
- Input
Manifest Builder - A builder for
InputManifest
s. - Input
Manifest Relation - A single row in an
InputManifest
.
Enums§
- Embedding
Mode - Indicate whether to embed the identifier for an input manifest in an artifact.
- Error
- Represents any errors from the
omnibor
crate.