rcp-tools-filegen-0.31.0 is not a library.

RCP TOOLS

This repo contains tools to efficiently copy, remove and link large filesets, both locally and across remote hosts.

Demo

rcp is for copying files; similar to cp but generally MUCH faster when dealing with large filesets.

Supports both local and remote copying using host:/path syntax (similar to scp).

Inspired by tools like dsync(1) and pcp(2).
rrm is for removing large filesets.
rlink allows hard-linking filesets with optional update path; typically used for hard-linking datasets with a delta.
rcmp tool is for comparing filesets.
filegen tool generates sample filesets, useful for testing.

Documentation

API documentation for the command-line tools is available on docs.rs:

rcp-tools-rcp - File copying tool (rcp & rcpd)
rcp-tools-rrm - File removal tool
rcp-tools-rlink - Hard-linking tool
rcp-tools-rcmp - File comparison tool
rcp-tools-filegen - Test file generation utility

For contributors: Internal library crates used by the tools above:

rcp-tools-common - Shared utilities and types
rcp-tools-remote - Remote operation protocol
rcp-tools-throttle - Resource throttling

Design and reference documents (in the docs/ directory):

Security - Threat model and security architecture
Remote Copy - rcpd deployment, version checking, troubleshooting
Remote Protocol - Wire protocol specification
Testing - Test infrastructure and Docker multi-host testing

Examples

Basic local copy with progress-bar and summary at the end:

> rcp <foo> <bar> --progress --summary

Copy while preserving metadata, overwrite/update destination if it already exists:

> rcp <foo> <bar> --preserve-settings=all --progress --summary --overwrite

Remote copy from one host to another:

> rcp user@host1:/path/to/source user@host2:/path/to/dest --progress --summary

Copies files from host1 to host2. The rcpd process is automatically started on both hosts via SSH.

Copy from remote host to local machine:

> rcp host:/remote/path /local/path --progress --summary

Copy from local machine to remote host and preserve metadata:

> rcp /local/path host:/remote/path --progress --summary --preserve-settings=all

Remote copy with automatic rcpd deployment:

> rcp /local/data remote-host:/backup --auto-deploy-rcpd --progress

Automatically deploys rcpd to the remote host if not already installed. Useful for dynamic infrastructure, development environments, or when rcpd is not pre-installed on remote hosts.

Log tool output to a file while using progress bar:

> rcp <foo> <bar> --progress --summary > copy.log

Progress bar is sent to stderr while log messages go to stdout. This allows us to pipe stdout to a file to preserve the tool output while still viewing the interactive progress bar. This works for all RCP tools.

Path handling (tilde `~` support)

Local paths: leading ~ or ~/... expands to your local $HOME.
Remote paths: leading ~/... expands to the remote user’s $HOME (resolved over SSH). Other ~user forms are not supported.
Remote paths may be absolute, start with ~/, or be relative. Relative remote paths are resolved against the local current working directory before being used remotely.

Remove a path recursively:

> rrm <bar> --progress --summary

Hard-link contents of one path to another:

> rlink <foo> <bar> --progress --summary

Roughly equivalent to: cp -p --link <foo> <bar>.

Hard-link contents of <foo> to <baz> if they are identical to <bar>:

> rlink <foo> --update <bar> <baz> --update-exclusive --progress --summary

Using --update-exclusive means that if a file is present in <foo> but not in <bar> it will be ignored. Roughly equivalent to: rsync -a --link-dest=<foo> <bar> <baz>.

Control which metadata is preserved with --preserve-settings:

# preserve nothing (directories get default mode, no uid/gid/time)
> rlink <foo> <bar> --preserve-settings=none

# custom: preserve uid, gid, and time on files and dirs
> rlink <foo> <bar> --preserve-settings="f:uid,gid,time,0777 d:uid,gid,time,0777 l:uid,gid,time"

Hard-linked files always share metadata with their source via the inode -- preserve settings affect directories and symlinks in all modes, and additionally files that are copied (not linked) during --update operations. By default rlink uses --preserve-settings=all.

When using --update with --preserve-settings that does not cover all compared attributes (e.g. --preserve-settings=none while --update-compare=size,mtime), rlink will error to prevent silent data integrity issues. Use --allow-lossy-update to override.

Compare <foo> vs. <bar>:

# differences are printed to stdout by default
> rcmp <foo> <bar> --progress --summary

# use --log to write differences to a file instead
> rcmp <foo> <bar> --progress --summary --log compare.log

# use --quiet to suppress stdout output (exit code only)
> rcmp <foo> <bar> --quiet

# --quiet with --log: silent stdout but differences still written to file
> rcmp <foo> <bar> --quiet --log compare.log

Installation

nixpkgs

All tools are available via nixpkgs under rcp package name.

The following command will install all the tools on your system:

> nix-env -iA nixpkgs.rcp

crates.io

All tools are available on crates.io. Individual tools can be installed using cargo install:

> cargo install rcp-tools-rcp

debian / rhel

Starting with release v0.10.1, .deb and .rpm packages are available as part of each release.

Static musl builds

The repository is configured to build static musl binaries by default via .cargo/config.toml. Simply run cargo build or cargo build --release to produce fully static binaries. To build glibc binaries instead, use cargo build --target x86_64-unknown-linux-gnu.

For development, enter the nix environment (nix develop) to get all required tools including the musl toolchain. Outside nix shell, install the musl target with rustup target add x86_64-unknown-linux-musl and ensure you have musl-tools installed (e.g., apt-get install musl-tools on Ubuntu/Debian).

General controls

Copy semantics

The copy semantics for RCP tools differ slightly from how e.g. the cp tool works. This is because of the ambiguity in the result of a cp operation that we wanted to avoid.

Specifically, the result of cp foo/x bar/x depends on bar/x being a directory. If so, the resulting path will be bar/x/x (which is usually undesired), otherwise it will be bar/x.

To avoid this confusion, RCP tools:

will NOT overwrite data by default (use --overwrite to change)
do assume that a path WITHOUT a trailing slash is the final name of the destination and
path ending in slash is a directory into which we want to copy the sources (without renaming)

The following examples illustrate this (those rules apply to both rcp and rlink):

rcp A/B C/D - copy A/B into C/ and name it D; if C/D exists fail immediately
rcp A/B C/D/ - copy B into D WITHOUT renaming i.e., the resulting path will be C/D/B; if C/B/D exists fail immediately

Using rcp it's also possible to copy multiple sources into a single destination, but the destination MUST have a trailing slash (/):

rcp A B C D/ - copy A, B and C into D WITHOUT renaming i.e., the resulting paths will be D/A, D/B and D/C; if any of which exist fail immediately

Throttling

set --ops-throttle to limit the maximum number of operations per second
- useful if you want to avoid interfering with other work on the storage / host
set --iops-throttle to limit the maximum number of I/O operations per second
- MUST be used with --chunk-size, which is used to calculate I/O operations per file
set --max-open-files to limit the maximum number of open files
- RCP tools will automatically adjust the maximum based on the system limits however, this setting can be used if there are additional constraints

Error handling

rcp tools will log non-terminal errors and continue by default
to fail immediately on any error use the --fail-early flag

Remote copy configuration

When using remote paths (host:/path syntax), rcp automatically starts rcpd daemons on remote hosts via SSH.

Requirements:

SSH access to remote hosts (uses your SSH config and keys)
rcpd binary available on remote hosts (see Auto-deployment below for automatic setup)

Auto-deployment: Starting with v0.22.0, rcp can automatically deploy rcpd to remote hosts using the --auto-deploy-rcpd flag. This eliminates the need to manually install rcpd on each remote host.

# automatic deployment - no manual setup required
> rcp --auto-deploy-rcpd host1:/source host2:/dest --progress

When auto-deployment is enabled:

rcp finds the local rcpd binary (same directory or PATH)
Deploys it to ~/.cache/rcp/bin/rcpd-{version} on remote hosts via SSH
Verifies integrity using SHA-256 checksums
Keeps the last 3 versions and cleans up older ones
Reuses deployed binaries for subsequent operations (cached until version changes)

Manual deployment is still supported and may be preferred for:

Air-gapped environments where auto-deployment is not feasible
Production systems with strict change control
Situations where you want to verify the binary before deployment

Configuration options:

--port-ranges - restrict TCP data ports to specific ranges (e.g., "8000-8999")
--remote-copy-conn-timeout-sec - connection timeout in seconds (default: 15)

Architecture: The remote copy uses a three-node architecture:

Master (rcp) orchestrates the copy operation
Source rcpd reads files from source host
Destination rcpd writes files to destination host
Data flows directly from source to destination (not through master)

For detailed network connectivity and troubleshooting information, see docs/remote_copy.md.

Security

Remote copy uses SSH for authentication and TLS 1.3 for encrypted data transfer.

Security Model:

SSH Authentication: All remote operations require SSH authentication first
TLS Encryption: Data transfers are encrypted by default using TLS 1.3 with certificate pinning
Mutual Authentication: Both source and destination verify each other's certificates

What's Protected:

✅ Unauthorized access (SSH authentication required)
✅ Data encryption (TLS 1.3 with AES-256-GCM)
✅ Man-in-the-middle attacks (certificate fingerprint verification)

Performance Option: For trusted networks where encryption overhead is undesirable, use --no-encryption:

> rcp --no-encryption source:/path dest:/path

Warning: This disables both encryption and authentication on the data path.

Best Practices:

Use SSH key-based authentication
Keep encryption enabled (default) for sensitive data
Only use --no-encryption on isolated, trusted networks

For detailed security architecture and threat model, see docs/security.md.

Terminal output

Log messages

sent to stdout
by default only errors are logged
verbosity controlled using -v/-vv/-vvv for INFO/DEBUG/TRACE and -q/--quiet to disable

Progress

sent to stderr (both ProgressBar and TextUpdates)
by default disabled
enabled using -p/--progress with optional --progress-type=... override

Summary

sent to stdout
by default disabled
enabled using --summary

Filtering

All tools support pattern-based filtering with --include and --exclude flags.

Pattern Syntax

* matches anything except /
** matches anything including / (crosses directories)
? matches a single character (except /)
[...] character classes (e.g., [abc], [0-9])
Leading / anchors the pattern to the source root
Trailing / matches only directories

Precedence

Only --exclude: include everything except matches
Only --include: include only matches (exclude everything else)
Both: excludes are checked first, then includes

Examples

# Copy only .rs files
> rcp --include '*.rs' src/ dst/

# Copy everything except log files and the target directory
> rcp --exclude '*.log' --exclude 'target/' src/ dst/

# Copy only files in the src directory at root level
> rcp --include '/src/**' project/ backup/

# Use a filter file for complex patterns
> rcp --filter-file=filters.txt src/ dst/

Filter file format (filters.txt):

# Comments start with #
--include *.rs
--include Cargo.toml
--exclude target/
--exclude *.log

Pattern Semantics

Simple patterns (like *.txt, *_dir/) apply to all files at any level, including the source root itself.

Anchored patterns (starting with /, like /src/** or /bar/*.txt) match paths inside the source, not the source root itself. This allows you to copy a directory while filtering its contents:

# Copy project/ but only include the src subdirectory and its contents
> rcp --include '/src/**' project/ backup/
# Results in: backup/src/...

# Exclude the build directory inside the source
> rcp --exclude '/build/' project/ backup/

Note: /src matches only the src directory entry itself, while /src/** matches the directory and all files inside it.

Path patterns (containing / but not starting with /, like bar/*.txt) match relative paths inside the source directory.

Dry-Run Mode

Preview what would happen without making changes:

# Show only what would be copied
> rcp --dry-run=brief --exclude '*.log' src/ dst/

# Also show skipped files
> rcp --dry-run=all --exclude '*.log' src/ dst/

# Show skipped files with the pattern that caused the skip
> rcp --dry-run=explain --exclude '*.log' src/ dst/

Note: Dry-run mode is primarily useful for previewing --include/--exclude filtering. It bypasses --overwrite checks and does not check whether files already exist at the destination. --progress and --summary are suppressed in dry-run mode (use -v to still see summary output).

Overwrite

rcp tools will not-overwrite pre-existing data unless used with the --overwrite flag.

Performance Tuning

For maximum throughput, especially with remote copies over high-speed networks, consider these optimizations.

System-Level Tuning (Linux)

TCP Socket Buffers

rcp automatically requests larger TCP socket buffers for high-throughput transfers, but the kernel caps these to system limits. Increase the limits to allow full utilization of high-bandwidth links:

# Check current limits
sysctl net.core.rmem_max net.core.wmem_max

# Increase to 16 MiB (requires root, temporary until reboot)
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216

# Make permanent (add to /etc/sysctl.d/99-rcp-perf.conf)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216

The default is often insufficient for 10+ Gbps links.

Open File Limits

When copying large filesets with many concurrent operations, you may hit the open file limit:

# Check current limit
ulimit -n

# Increase for current session
ulimit -n 65536

# Make permanent (add to /etc/security/limits.conf)
* soft nofile 65536
* hard nofile 65536

rcp automatically queries the system limit and uses --max-open-files to self-throttle, but higher limits allow more parallelism.

Network Backlog (10+ Gbps)

For very high-speed networks, increase the kernel's packet processing capacity:

sudo sysctl -w net.core.netdev_max_backlog=16384
sudo sysctl -w net.core.netdev_budget=600

Application-Level Tuning

Worker Threads

Control parallelism with --max-workers:

# Use all CPU cores (default)
rcp --max-workers=0 /source /dest

# Limit to 4 workers (reduce I/O contention)
rcp --max-workers=4 /source /dest

Remote Copy Buffer Size

For remote copies, the --remote-copy-buffer-size flag controls the size of data chunks sent over TCP:

# Larger buffers for high-bandwidth links (default: 16 MiB for datacenter)
rcp --remote-copy-buffer-size=32MiB host1:/data host2:/data

# Smaller buffers for constrained memory
rcp --remote-copy-buffer-size=4MiB host1:/data host2:/data

Network Profile

Use --network-profile to optimize for your network type:

# For datacenter/local networks (aggressive settings)
rcp --network-profile=datacenter host1:/data host2:/data

# For internet transfers (conservative settings)
rcp --network-profile=internet host1:/data host2:/data

Concurrent Connections

Control concurrent TCP connections for file transfers (default: 100):

# Increase for many small files on high-bandwidth links
rcp --max-connections=200 host1:/many-small-files host2:/dest

# Decrease to reduce resource usage
rcp --max-connections=16 host1:/data host2:/dest

Diagnosing Performance Issues

Check TCP Buffer Sizes

When rcp starts, it logs the actual buffer sizes achieved (visible with -v). If the actual sizes are much smaller than requested, increase your system's rmem_max/wmem_max.

Check for Network Issues

# Network interface drops
ip -s link show eth0 | grep -A1 RX | grep dropped

# TCP retransmissions (high values indicate network congestion)
ss -ti | grep retrans

Quick Checklist

For optimal performance on high-speed networks:

☐ Increase rmem_max/wmem_max to 16+ MiB
☐ Increase ulimit -n if copying many files
☐ Use --network-profile=datacenter for local/datacenter networks
☐ Use --progress to monitor throughput in real-time
☐ Check -v output to verify buffer sizes and connection setup

filegen Performance

The filegen tool generates random test data, which is CPU-intensive. Unlike other rcp tools that are typically I/O-bound, filegen's bottleneck is often the CPU generating random bytes.

Default behavior: filegen defaults --max-open-files to the number of physical CPU cores, rather than 80% of the system's open file limit used by other tools. This matches concurrency to compute capacity, avoiding excessive parallelism that would cause CPU contention.

Tuning for your workload:

# Use default (physical cores) - optimal for fast storage
filegen /tmp 3,2 10 1M --progress

# Increase for slow storage where I/O latency dominates
filegen /tmp 3,2 10 1M --max-open-files=64 --progress

# No limit (unlimited concurrency)
filegen /tmp 3,2 10 1M --max-open-files=0 --progress

Profiling

rcp supports several profiling and debugging options.

Chrome Tracing

Produces JSON trace files viewable in Perfetto UI or chrome://tracing.

# Profile a local copy
rcp --chrome-trace=/tmp/trace /source /dest

# Profile a remote copy (traces produced on all hosts)
rcp --chrome-trace=/tmp/trace host1:/path host2:/path

Output files are named: {prefix}-{identifier}-{hostname}-{pid}-{timestamp}.json

Example output:

/tmp/trace-rcp-master-myhost-12345-2025-01-15T10:30:45.json
/tmp/trace-rcpd-source-host1-23456-2025-01-15T10:30:46.json
/tmp/trace-rcpd-destination-host2-34567-2025-01-15T10:30:46.json

View traces by opening https://ui.perfetto.dev and dragging the JSON file into the browser.

Flamegraph

Produces folded stack files convertible to SVG flamegraphs using inferno.

# Profile and generate flamegraph data
rcp --flamegraph=/tmp/flame /source /dest

# Convert to SVG (requires: cargo install inferno)
cat /tmp/flame-*.folded | inferno-flamegraph > flamegraph.svg

# Or use inferno-flamechart to preserve chronological order
cat /tmp/flame-*.folded | inferno-flamechart > flamechart.svg

Output files are named: {prefix}-{identifier}-{hostname}-{pid}-{timestamp}.folded

Profile Level

Control which spans are captured with --profile-level (default: trace):

# Capture only info-level and above spans
rcp --chrome-trace=/tmp/trace --profile-level=info /source /dest

Only spans from rcp crates are captured (not tokio internals).

Tokio Console

Enable tokio-console for real-time async task inspection:

# Start rcp with tokio-console enabled
rcp --tokio-console /source /dest

# Or specify a custom port
rcp --tokio-console --tokio-console-port=6670 /source /dest

# Connect with tokio-console CLI
tokio-console http://127.0.0.1:6669

Trace events are retained for 60s by default. This can be modified with RCP_TOKIO_TRACING_CONSOLE_RETENTION_SECONDS=120.

Combined profiling

All profiling options can be used together:

rcp --chrome-trace=/tmp/trace --flamegraph=/tmp/flame --tokio-console /source /dest

rcp-tools-filegen 0.31.0

RCP TOOLS

Documentation

Examples

Path handling (tilde ~ support)

Installation

nixpkgs

crates.io

debian / rhel

Static musl builds

General controls

Copy semantics

Throttling

Error handling

Remote copy configuration

Security

Terminal output

Filtering

Pattern Syntax

Precedence

Examples

Pattern Semantics

Dry-Run Mode

Overwrite

Performance Tuning

System-Level Tuning (Linux)

TCP Socket Buffers

Open File Limits

Network Backlog (10+ Gbps)

Application-Level Tuning

Worker Threads

Remote Copy Buffer Size

Network Profile

Concurrent Connections

Diagnosing Performance Issues

Check TCP Buffer Sizes

Check for Network Issues

Quick Checklist

filegen Performance

Profiling

Chrome Tracing

Flamegraph

Profile Level

Tokio Console

Combined profiling

References

Path handling (tilde `~` support)