Provenant
A high-performance Rust rewrite of ScanCode Toolkit for scanning codebases for licenses, package metadata, file metadata, and related provenance data.
Overview
Provenant is built as a ScanCode-compatible replacement project with a strong focus on correctness, feature parity, safety, and performance.
Today the repository covers high-level scanning workflows for:
- License detection and license reference output
- Package and dependency metadata extraction across many ecosystems
- Package assembly for related manifests and lockfiles
- File metadata and scan environment metadata
- Optional copyright, holder, and author detection
- Optional email and URL extraction
- Multiple output formats, including ScanCode-style JSON, YAML, SPDX, CycloneDX, HTML, and custom templates
For architecture, supported formats, testing, and contributor guidance, start with the Documentation Index.
Features
- Parallel scanning with Rust-native performance
- ScanCode-compatible JSON output and broad output-format support
- Broad package-manifest and lockfile coverage across many ecosystems
- Package assembly for sibling, nested, and workspace-style inputs
- Include/exclude filtering, path normalization, and scan-result filtering
- Persistent scan-cache controls for repeated runs
- Security-first parsing with explicit safeguards and compatibility-focused tradeoffs where needed
Installation
From Crates.io
Install the Provenant package from crates.io under the crate name provenant-cli:
This installs the provenant binary.
Download Precompiled Binary (Recommended)
Download the appropriate binary for your platform from the GitHub Releases page:
- Linux (x64):
provenant-x86_64-unknown-linux-gnu.tar.gz - Linux (ARM64):
provenant-aarch64-unknown-linux-gnu.tar.gz - macOS (Apple Silicon):
provenant-aarch64-apple-darwin.tar.gz- Intel Macs can use the ARM build via Rosetta 2
- Windows:
provenant-x86_64-pc-windows-msvc.zip
Extract and place the binary in your system's PATH:
# Example for Linux/macOS
Build from Source
The compiled binary will be available at target/release/provenant.
Usage
At least one output option is required.
For the complete CLI surface, run:
Commonly used options include:
--json,--json-pp,--json-lines,--yaml,--html,--csv--spdx-tv,--spdx-rdf,--cyclonedx,--cyclonedx-xml--custom-output,--custom-template--exclude/--ignore,--include,--max-depth,--processes--cache-dir,--cache-clear,--from-json,--no-assemble--filter-clues,--only-findings,--mark-source--copyright,--email,--url
Example
Use - as FILE to write an output stream to stdout (for example: --json-pp -).
Multiple output flags can be used in a single run, matching ScanCode CLI behavior.
When using --from-json, you can pass multiple JSON inputs; directory scan mode currently supports one input path.
Cache location can also be controlled with the PROVENANT_CACHE environment variable.
For the generated package-format support matrix, see Supported Formats.
Performance
Provenant is designed to be significantly faster than the Python-based ScanCode Toolkit, especially for large codebases, thanks to native Rust performance and parallel processing. See Architecture: Performance Characteristics for details.
Output Formats
Implemented output formats:
- JSON (ScanCode-compatible baseline)
- YAML
- JSON Lines
- CSV
- SPDX (Tag-Value, RDF/XML)
- CycloneDX (JSON, XML)
- HTML report
- Custom template rendering
Additional parity-oriented outputs such as the HTML app surface are present in the codebase, but the README focuses on the primary user-facing formats above.
Output architecture and compatibility approach are documented in:
Documentation
- Documentation Index - Best starting point for navigating the docs set
- Architecture - System design, processing pipeline, and design decisions
- Supported Formats - Generated support matrix for package ecosystems and file formats
- How to Add a Parser - Step-by-step guide for adding new parsers
- Testing Strategy - Testing approach and guidelines
- ADRs - Architectural decision records
- Beyond-Parity Improvements - Features where Rust exceeds the Python original
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Setting Up for Local Development
To contribute to Provenant, follow these steps to set up the repository for local development:
-
Install Rust
Ensure you have Rust installed on your system. You can install it using rustup:| -
Clone the Repository
Clone theProvenantrepository to your local machine: -
Initialize and Update Project Submodules
Use the following script to initialize submodules, configure sparse checkout, and update the SPDX license-data submodule to the latest upstream state.
Ifpre-commitis installed, this script also installs Git pre-commit hooks automatically: -
Build the Project
Build the project with Cargo: -
Run Tests
Run the test suite to ensure everything is working correctly: -
Install Pre-commit (if needed)
This repository uses pre-commit to run checks before each commit.
For documentation hooks and commands, install Node.js and npm first (package.jsoncurrently requires Node>=24). If you installpre-commitafter running./setup.sh, runpre-commit installonce:# Using pip # Or using brew on macOS # Install the hooksCommon documentation quality commands:
-
Start Developing
You can now make changes and test them locally. Usecargo run --bin provenantto execute the tool:
Publishing a Release (Maintainers Only)
Releases are automated using cargo-release and GitHub Actions.
Prerequisites
One-time setup:
-
Install
cargo-releaseCLI tool: -
Authenticate with crates.io (one-time only):
Enter your crates.io API token when prompted. This is stored in
~/.cargo/credentials.tomland persists across sessions.
Release Process
Use the release.sh script:
# Dry-run first (recommended)
# Then execute the actual release
Available release types:
patch: IncrementsX.Y.ZtoX.Y.(Z+1)minor: IncrementsX.Y.ZtoX.(Y+1).0major: IncrementsX.Y.Zto(X+1).0.0
Registry note: the published crate name is
provenant-cli, while the installed binary and product name remainprovenant/ Provenant.
What happens automatically:
- Updates SPDX license data to the latest version from upstream
- Commits the license data update (if changes detected)
cargo-releaseupdates the version inCargo.tomlandCargo.lock- Creates a git commit:
chore: release vX.Y.Z - Creates a GPG-signed git tag:
vX.Y.Z - Publishes the
provenant-clicrate to crates.io - Pushes commits and tag to GitHub
- GitHub Actions workflow is triggered by the tag
- Builds binaries for all published targets:
- Linux: x64 and ARM64
- macOS: ARM64 (Apple Silicon; Intel Macs can use Rosetta 2 with the ARM build)
- Windows: x64
- Creates archives (.tar.gz/.zip) and SHA256 checksums
- Creates a GitHub Release with all artifacts and auto-generated release notes
Note: The release script ensures every release ships with the latest SPDX license definitions. It also handles a sparse checkout workaround for
cargo-release.
Monitor the GitHub Actions workflow to verify completion.
Credits
Provenant is an independent Rust rewrite of ScanCode Toolkit. It uses the upstream ScanCode Toolkit project by nexB Inc. and the AboutCode community as a reference for compatibility, behavior, and parity validation. We are grateful to nexB Inc. and the AboutCode community for the reference implementation and the extensive license and copyright research behind it. See NOTICE for preserved upstream attribution notices applicable to materials included in this repository and to distributions that include ScanCode-derived data.
License
The Provenant project code is licensed under the Apache License 2.0. See NOTICE for preserved upstream attribution notices for included ScanCode Toolkit materials.