scancode-rust
A high-performance code scanning tool written in Rust that detects licenses, copyrights, and other relevant metadata in source code.
Overview
scancode-rust is designed to be a faster alternative to the Python-based ScanCode Toolkit, aiming to produce compatible output formats while delivering significantly improved performance. This tool currently scans codebases to identify:
- License information
- File metadata
- System information
More ScanCode features coming soon!
Features
- Efficient file scanning with multi-threading
- Compatible output format with ScanCode Toolkit
- Progress indication for large scans
- Configurable scan depth
- File/directory exclusion patterns
Installation
From Crates.io (Recommended)
Download Precompiled Binary
Download the appropriate binary for your platform from the GitHub Releases page:
- Linux (x64):
scancode-rust-x86_64-unknown-linux-gnu.tar.gz - Linux (ARM64):
scancode-rust-aarch64-unknown-linux-gnu.tar.gz - macOS (Intel):
scancode-rust-x86_64-apple-darwin.tar.gz - macOS (Apple Silicon):
scancode-rust-aarch64-apple-darwin.tar.gz - Windows:
scancode-rust-x86_64-pc-windows-msvc.zip
Extract and place the binary in your system's PATH:
# Example for Linux/macOS
Build from Source
The compiled binary will be available at target/release/scancode-rust.
Usage
Options
Example
Performance
scancode-rust is designed to be significantly faster than the Python-based ScanCode Toolkit, especially for large codebases. Performance improvements come from:
- Native Rust implementation
- Efficient parallel processing
- Optimized file handling
Output Format
The tool produces JSON output compatible with ScanCode Toolkit, including:
- Scan headers with timestamp information
- File-level data with license and metadata information
- System environment details
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Setting Up for Local Development
To contribute to scancode-rust, follow these steps to set up the repository for local development:
-
Install Rust
Ensure you have Rust installed on your system. You can install it using rustup:| -
Clone the Repository
Clone thescancode-rustrepository to your local machine: -
Initialize the License Submodule
Use the following script to initialize the submodule and configure sparse checkout: -
Install Dependencies
Install the required Rust dependencies usingcargo: -
Run Tests
Run the test suite to ensure everything is working correctly: -
Set Up Pre-commit Hooks
This repository uses pre-commit to run checks before each commit:# Using pip # Or using brew on macOS # Install the hooks -
Start Developing
You can now make changes and test them locally. Usecargo runto execute the tool:
Publishing a Release (Maintainers Only)
Releases are automated using cargo-release and GitHub Actions.
Prerequisites
One-time setup:
-
Install
cargo-releaseCLI tool: -
Authenticate with crates.io (one-time only):
Enter your crates.io API token when prompted. This is stored in
~/.cargo/credentials.tomland persists across sessions.
Release Process
Use the release.sh script:
# Dry-run first (recommended)
# Then execute the actual release
Available release types:
patch: Increments the patch version (0.0.4 → 0.0.5)minor: Increments the minor version (0.0.4 → 0.1.0)major: Increments the major version (0.0.4 → 1.0.0)
What happens automatically:
- Updates SPDX license data to the latest version from upstream
- Commits the license data update (if changes detected)
cargo-releaseupdates the version inCargo.tomlandCargo.lock- Creates a git commit:
chore: release vX.Y.Z - Creates a GPG-signed git tag:
vX.Y.Z - Publishes to crates.io
- Pushes commits and tag to GitHub
- GitHub Actions workflow is triggered by the tag
- Builds binaries for all platforms (Linux, macOS, Windows on x64 and ARM64)
- Creates archives (.tar.gz/.zip) and SHA256 checksums
- Creates a GitHub Release with all artifacts and auto-generated release notes
Note: The release script ensures every release ships with the latest SPDX license definitions. It also handles a sparse checkout workaround for
cargo-release.
Monitor the GitHub Actions workflow to verify completion.
License Data Architecture
How License Detection Works
This tool uses the SPDX License List Data for license detection. The license data is:
- Stored in a Git submodule at
resources/licenses/(sparse checkout ofjson/details/only) - Embedded at compile time using Rust's
include_dir!macro (seesrc/main.rs) - Built into the binary - no runtime dependencies on external files
This means:
- For users: The binary is self-contained and portable
- For developers: The submodule must be initialized before building
- Package size: Only the needed JSON files are included in the published crate
Updating the License Data
For Releases: The release.sh script automatically updates the license data to the latest version before publishing. No manual action needed.
For Development:
To initialize or update to the latest SPDX license definitions:
The script will show if the license data was updated. If so, commit the change:
The setup.sh script:
- Initializes the submodule with shallow clone (
--depth=1) - Configures sparse checkout to only include
json/details/(saves ~90% disk space) - Updates to the latest upstream version
- The build process then embeds these files directly into the compiled binary
License
This project is licensed under the Apache License 2.0.