provenant-cli 0.0.5

Provenant is a high-performance Rust scanner for licenses, packages, and source provenance.
Documentation

Provenant

A high-performance Rust rewrite of ScanCode Toolkit for scanning codebases for licenses, package metadata, file metadata, and related provenance data.

Overview

Provenant is built as a ScanCode-compatible replacement project with a strong focus on correctness, feature parity, safety, and performance.

Today the repository covers high-level scanning workflows for:

  • License detection and license reference output
  • Package and dependency metadata extraction across many ecosystems
  • Package assembly for related manifests and lockfiles
  • File metadata and scan environment metadata
  • Optional copyright, holder, and author detection
  • Optional email and URL extraction
  • Multiple output formats, including ScanCode-style JSON, YAML, SPDX, CycloneDX, HTML, and custom templates

For architecture, supported formats, testing, and contributor guidance, start with the Documentation Index.

Features

  • Parallel scanning with Rust-native performance
  • ScanCode-compatible JSON output and broad output-format support
  • Broad package-manifest and lockfile coverage across many ecosystems
  • Package assembly for sibling, nested, and workspace-style inputs
  • Include/exclude filtering, path normalization, and scan-result filtering
  • Persistent scan-cache controls for repeated runs
  • Security-first parsing with explicit safeguards and compatibility-focused tradeoffs where needed

Installation

From Crates.io

Install the Provenant package from crates.io under the crate name provenant-cli:

cargo install provenant-cli

This installs the provenant binary.

Download Precompiled Binary (Recommended)

Download the appropriate binary for your platform from the GitHub Releases page:

  • Linux (x64): provenant-x86_64-unknown-linux-gnu.tar.gz
  • Linux (ARM64): provenant-aarch64-unknown-linux-gnu.tar.gz
  • macOS (Apple Silicon): provenant-aarch64-apple-darwin.tar.gz
    • Intel Macs can use the ARM build via Rosetta 2
  • Windows: provenant-x86_64-pc-windows-msvc.zip

Extract and place the binary in your system's PATH:

# Example for Linux/macOS
tar xzf provenant-*.tar.gz
sudo mv provenant /usr/local/bin/

Build from Source

git clone https://github.com/mstykow/provenant.git
cd provenant
./setup.sh  # Initialize submodules and configure sparse checkout for project inputs
cargo build --release

The compiled binary will be available at target/release/provenant.

Usage

provenant --json-pp <FILE> [OPTIONS] <DIR_PATH>...

At least one output option is required.

For the complete CLI surface, run:

provenant --help

Commonly used options include:

  • --json, --json-pp, --json-lines, --yaml, --html, --csv
  • --spdx-tv, --spdx-rdf, --cyclonedx, --cyclonedx-xml
  • --custom-output, --custom-template
  • --exclude/--ignore, --include, --max-depth, --processes
  • --cache-dir, --cache-clear, --from-json, --no-assemble
  • --filter-clues, --only-findings, --mark-source
  • --copyright, --email, --url

Example

provenant --json-pp scan-results.json ~/projects/my-codebase --ignore "*.git*" --ignore "target/*" --ignore "node_modules/*"

Use - as FILE to write an output stream to stdout (for example: --json-pp -). Multiple output flags can be used in a single run, matching ScanCode CLI behavior. When using --from-json, you can pass multiple JSON inputs; directory scan mode currently supports one input path. Cache location can also be controlled with the PROVENANT_CACHE environment variable.

For the generated package-format support matrix, see Supported Formats.

Performance

Provenant is designed to be significantly faster than the Python-based ScanCode Toolkit, especially for large codebases, thanks to native Rust performance and parallel processing. See Architecture: Performance Characteristics for details.

Output Formats

Implemented output formats:

  • JSON (ScanCode-compatible baseline)
  • YAML
  • JSON Lines
  • CSV
  • SPDX (Tag-Value, RDF/XML)
  • CycloneDX (JSON, XML)
  • HTML report
  • Custom template rendering

Additional parity-oriented outputs such as the HTML app surface are present in the codebase, but the README focuses on the primary user-facing formats above.

Output architecture and compatibility approach are documented in:

Documentation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Setting Up for Local Development

To contribute to Provenant, follow these steps to set up the repository for local development:

  1. Install Rust
    Ensure you have Rust installed on your system. You can install it using rustup:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
  2. Clone the Repository
    Clone the Provenant repository to your local machine:

    git clone https://github.com/mstykow/provenant.git
    cd provenant
    
  3. Initialize and Update Project Submodules
    Use the following script to initialize submodules, configure sparse checkout, and update the SPDX license-data submodule to the latest upstream state.
    If pre-commit is installed, this script also installs Git pre-commit hooks automatically:

    ./setup.sh
    
  4. Build the Project
    Build the project with Cargo:

    cargo build
    
  5. Run Tests
    Run the test suite to ensure everything is working correctly:

    cargo test
    
  6. Install Pre-commit (if needed)
    This repository uses pre-commit to run checks before each commit.
    For documentation hooks and commands, install Node.js and npm first (package.json currently requires Node >=24). If you install pre-commit after running ./setup.sh, run pre-commit install once:

    # Using pip
    pip install pre-commit
    
    # Or using brew on macOS
    brew install pre-commit
    
    # Install the hooks
    pre-commit install
    

    Common documentation quality commands:

    npm run check:docs  # markdownlint + prettier check
    npm run fix:docs    # markdownlint auto-fix + prettier write
    
  7. Start Developing
    You can now make changes and test them locally. Use cargo run --bin provenant to execute the tool:

    cargo run --bin provenant -- [OPTIONS] <DIR_PATH>
    

Publishing a Release (Maintainers Only)

Releases are automated using cargo-release and GitHub Actions.

Prerequisites

One-time setup:

  1. Install cargo-release CLI tool:

    cargo install cargo-release
    
  2. Authenticate with crates.io (one-time only):

    cargo login
    

    Enter your crates.io API token when prompted. This is stored in ~/.cargo/credentials.toml and persists across sessions.

Release Process

Use the release.sh script:

# Dry-run first (recommended)
./release.sh patch

# Then execute the actual release
./release.sh patch --execute

Available release types:

  • patch: Increments X.Y.Z to X.Y.(Z+1)
  • minor: Increments X.Y.Z to X.(Y+1).0
  • major: Increments X.Y.Z to (X+1).0.0

Registry note: the published crate name is provenant-cli, while the installed binary and product name remain provenant / Provenant.

What happens automatically:

  1. Updates SPDX license data to the latest version from upstream
  2. Commits the license data update (if changes detected)
  3. cargo-release updates the version in Cargo.toml and Cargo.lock
  4. Creates a git commit: chore: release vX.Y.Z
  5. Creates a GPG-signed git tag: vX.Y.Z
  6. Publishes the provenant-cli crate to crates.io
  7. Pushes commits and tag to GitHub
  8. GitHub Actions workflow is triggered by the tag
  9. Builds binaries for all published targets:
    • Linux: x64 and ARM64
    • macOS: ARM64 (Apple Silicon; Intel Macs can use Rosetta 2 with the ARM build)
    • Windows: x64
  10. Creates archives (.tar.gz/.zip) and SHA256 checksums
  11. Creates a GitHub Release with all artifacts and auto-generated release notes

Note: The release script ensures every release ships with the latest SPDX license definitions. It also handles a sparse checkout workaround for cargo-release.

Monitor the GitHub Actions workflow to verify completion.

Credits

Provenant is an independent Rust rewrite of ScanCode Toolkit. It uses the upstream ScanCode Toolkit project by nexB Inc. and the AboutCode community as a reference for compatibility, behavior, and parity validation. We are grateful to nexB Inc. and the AboutCode community for the reference implementation and the extensive license and copyright research behind it. See NOTICE for preserved upstream attribution notices applicable to materials included in this repository and to distributions that include ScanCode-derived data.

License

The Provenant project code is licensed under the Apache License 2.0. See NOTICE for preserved upstream attribution notices for included ScanCode Toolkit materials.