picca 1.1.0

A Parallel Implementation of Common Checksum Algorithms
Documentation

picca

picca (acronym for a Parallel Implementation of Common Checksum Algorithms) is a relatively small Rust program and wrapper library intented to speed up hashing files by reading multiple files at once.

Awesome Features!

What CAN picca do?

picca can:

  1. Hash any file using any one of 50 algorithms! Current options can be found in the source code here.
  2. Read checksum files in BSD and GNU Tagged formats, and verify the files listed.
  3. Hash each file up to the number of cores/threads your CPU has, or a custom amount.
  4. Be a potential replacement for the UNIX cksum utility and the individual algorithmic checksum utilities (sha256sum, b2sum, etc.)
  5. Ignore common errors, such as missing files, and produce quiet output.

What is picca NOT?

  1. A custom implementation of each hashing algorithm. The hash algorithims are sourced from RustCrypto's hashes libraries.

How do I use such an awesome program?

picca operates mostly (if not, the same) as the UNIX cksum program. Here is a non-exhaustive list of examples:

  • On it's own, picca will hash whatever comes in from standard input using the SHA256 algorithm.
    $ picca
    ^D
    e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  -
    $ echo hi | picca
    98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4  -
    
  • Using the -a flag, you can change the algorithm (only with the single binary, see the section Installing the binaries for what that means).
    $ echo hi | picca -a md5
    764efa883dda1e11db47671c4a3bbd9e  -
    
  • Instead of reading from standard input, you can specify file(s) as arguments to hash the contents of them instead.
    $ picca LICENSE
    3972dc9744f6499f0f9b2dbf76696f2ae7ad8af9b23dde66d6af86c9dfb36986  LICENSE
    
  • By default, picca will read and hash up to n files simultaneously in separate threads, where n is the amount of cores your CPU has. You can adjust the amount of threads using the -t flag.
    # use only 4 threads
    $ picca -t4 LICENSE
    3972dc9744f6499f0f9b2dbf76696f2ae7ad8af9b23dde66d6af86c9dfb36986  LICENSE
    
    • If less than n files are specified, picca will ignore the thread count you specify and spawn as many threads as there are files.
  • You can also use picca to verify the hash of files using the -c flag.
    $ cat files.sha256
    3972dc9744f6499f0f9b2dbf76696f2ae7ad8af9b23dde66d6af86c9dfb36986  LICENSE
    $ picca -c files.sha256
    LICENSE: OK
    
    • picca supports BSD-style and GNU-style checksum formats.
    • For BSD-style checksums, if the hashes in your file are different than SHA256, you must specify the algorithm with -a.

picca is also available for use in your Rust program as a crate! See the examples to learn how to use it.

Project hosting distribution

The picca project is hosted on two separate Forgejo instances, Codeberg and my own (hereafter refered to as "my forge" or "forge"), mainly because I enjoy self-hosting, but also as an experiment to see if projects like these can survive being hosted like this. Here is what lives on each:

Both Codeberg and My Forge

  • The raw Git repository
    • All changes that occur on either platform are synced to the other immediately
  • Forgejo releases

Codeberg only

  • Issue management and pull requests (to welcome community contributions)
  • Project management (see the picca project)

My Forge only

Installing the Binaries

picca offers two categories of installs, refered to hereafter as "single" and "standalone" respectively:

  1. The singular picca binary that can hash with any of the supported algorithms (defaults to SHA256)
  2. Individual binaries named after the only algorithm they can hash with, such as md5sum, blake3sum, etc.

The single and standalone binaries can be installed in several different ways depending on what you prefer and/or what your operating system offers.

  • Platform Agnostic
    • Cargo (building and installing from source)
    • Docker/Podman
  • Linux
    • Alpine 3.x
    • Arch
    • Debian 12, 13/Ubuntu 22, 24
    • Enterprise Linux (Rocky/Red Hat) 8, 9, 10
    • Gentoo
    • Ubuntu 22, 24
    • Individual Binaries
  • macOS Binaries
  • Windows Binaries

Cargo

This is essentially just compiling it yourself and putting it in the same $PATH location as Cargo so it can be invoked.

# clone the repository
# you can use forge.steck.dev instead of codeberg.org if you wish.
git clone https://codeberg.org/bryson/picca && cd picca
# to install the single binary:
cargo install --features bin --bin picca --path .
# to install all the standalone binaries
for i in $(ls -d singles/*/); do
  cargo install --path $i
done
# to install only a certain standalone binary, replace <NAME> with the name of the binary:
cargo install --path singles/<NAME>

Docker/Podman

Not really "installing" per se, but allows you to use it in an isolated environment.

A OCI image is available for use in an OCI-compatible container environment like Docker or Podman. The following examples use Docker.

# Run in a bare minimum container. -i is required to read from stdin, but I like to add --rm so I can skip the step of removing the container after.
docker run --rm -i forge.steck.dev/bryson/picca
# Mount file and scan it. -i is no longer needed since we won't read from stdin, but all arguments after the image name are passed to picca.
docker run --rm -v ./file:/tmp/file forge.steck.dev/bryson/picca /tmp/file

Adding to your image

FROM forge.steck.dev/bryson/picca:latest AS picca
COPY --from=picca /usr/local/bin/picca /usr/local/bin

Alpine Linux

Follow these instructions to set up the Alpine repository on your system.

Arch Linux

Follow these instructions to set up the Arch repository on your system.

Debian/Ubuntu Linux

Follow these instructions to set up the Apt repository on your system.

Enterprise Linux (Rocky, Red Hat)

Follow these instructions to set up the DNF repository on your system.

Gentoo Linux

The Gentoo package is hosted in a Git-based overlay on my forge.

# Use eselect to add the overlay
sudo eselect repository add bryson-steck git https://forge.steck.dev/pkg/gentoo.git
# Use emaint to sync the overlay
sudo emaint sync -r bryson-steck
# Finally, emerge picca
sudo emerge -av sys-apps/picca

Individual Binaries

If your distro doesn't appear above or has different requirements (architecture, libc, etc.), you can use one of the several release binaries and install them in a place you can find and use it.

# assuming that ~/bin is in $PATH:
cd ~
curl https://codeberg.org/bryson/picca/releases/download/v0.15.2/picca-v0.15.2-x86_64-unknown-linux-gnu.tar.xz | tar xvf - bin/picca

macOS

macOS binaries aren't packaged in an installer format, you must install them in a place you can find it.

# assuming that ~/bin is in $PATH:
cd ~
curl https://codeberg.org/bryson/picca/releases/download/v0.15.2/picca-v0.15.2-aarch64-apple-darwin.tar.xz | tar xvf - bin/picca

Windows

Windows binaries aren't packaged in an installer format, you must install them in a place you can find it.

# assuming that ~\bin is in $env:PATH:
Set-Location ~\Downloads
Invoke-WebRequest "https://codeberg.org/bryson/picca/releases/download/v0.15.2/picca-v0.15.2-x86_64-pc-windows-msvc.zip" -OutFile "picca-v0.15.2-x86_64-pc-windows-msvc.zip"
Expand-Archive ".\picca-v0.15.2-x86_64-pc-windows-msvc.zip"
Move-Item ".\picca-v0.15.2-x86_64-pc-windows-msvc\bin\picca.exe" ~\bin

Installing the Library

picca is also available for use as a crate in your Rust program. Simply add it to your dependencies using Cargo:

cargo add picca
# you can also use the git repo if you wish.
cargo add --git https://codeberg.org/bryson/picca picca

Benchmarks

Some benchmarks are available for your use to see how picca stands up against your coreutils binaries. Simply run cargo bench to run the following benchmarks:

  • blake3 - This will test an algorithm that is already multithreaded against picca which is adding the ability to read multiple files at once (UNIX only, b3 must be available on your system).
  • sha256 - This tests a conventional, single threaded algorithm to allow picca to use more processing power that it leaves behind for reading the files.

Each benchmark creates 100, 300, and 500 random, 4MB files and runs the local binary and picca against those files to see if they provide any speed benefit. The more the files, the easier to see if it provides faster speeds.