RCP TOOLS
This repo contains tools to efficiently copy, remove and link large filesets, both locally and across remote hosts.

-
rcpis for copying files; similar tocpbut generally MUCH faster when dealing with large filesets.Supports both local and remote copying using
host:/pathsyntax (similar toscp).Inspired by tools like
dsync(1) andpcp(2). -
rrmis for removing large filesets. -
rlinkallows hard-linking filesets with optional update path; typically used for hard-linking datasets with a delta. -
rcmptool is for comparing filesets. -
filegentool generates sample filesets, useful for testing.
Documentation
API documentation for the command-line tools is available on docs.rs:
- rcp-tools-rcp - File copying tool (rcp & rcpd)
- rcp-tools-rrm - File removal tool
- rcp-tools-rlink - Hard-linking tool
- rcp-tools-rcmp - File comparison tool
- rcp-tools-filegen - Test file generation utility
For contributors: Internal library crates used by the tools above:
- rcp-tools-common - Shared utilities and types
- rcp-tools-remote - Remote operation protocol
- rcp-tools-throttle - Resource throttling
Design and reference documents (in the docs/ directory):
- Security - Threat model and security architecture
- Remote Copy - rcpd deployment, version checking, troubleshooting
- Remote Protocol - Wire protocol specification
- Testing - Test infrastructure and Docker multi-host testing
Examples
Basic local copy with progress-bar and summary at the end:
> rcp <foo> <bar> --progress --summary
Copy while preserving metadata, overwrite/update destination if it already exists:
> rcp <foo> <bar> --preserve-settings=all --progress --summary --overwrite
Remote copy from one host to another:
> rcp user@host1:/path/to/source user@host2:/path/to/dest --progress --summary
Copies files from host1 to host2. The rcpd process is automatically started on both hosts via SSH.
Copy from remote host to local machine:
> rcp host:/remote/path /local/path --progress --summary
Copy from local machine to remote host and preserve metadata:
> rcp /local/path host:/remote/path --progress --summary --preserve-settings=all
Remote copy with automatic rcpd deployment:
> rcp /local/data remote-host:/backup --auto-deploy-rcpd --progress
Automatically deploys rcpd to the remote host if not already installed. Useful for dynamic infrastructure, development environments, or when rcpd is not pre-installed on remote hosts.
Log tool output to a file while using progress bar:
> rcp <foo> <bar> --progress --summary > copy.log
Progress bar is sent to stderr while log messages go to stdout. This allows us to pipe stdout to a file to preserve the tool output while still viewing the interactive progress bar. This works for all RCP tools.
Path handling (tilde ~ support)
- Local paths: leading
~or~/...expands to your local$HOME. - Remote paths: leading
~/...expands to the remote user’s$HOME(resolved over SSH). Other~userforms are not supported. - Remote paths may be absolute, start with
~/, or be relative. Relative remote paths are resolved against the local current working directory before being used remotely.
Remove a path recursively:
> rrm <bar> --progress --summary
Hard-link contents of one path to another:
> rlink <foo> <bar> --progress --summary
Roughly equivalent to: cp -p --link <foo> <bar>.
Hard-link contents of <foo> to <baz> if they are identical to <bar>:
> rlink <foo> --update <bar> <baz> --update-exclusive --progress --summary
Using --update-exclusive means that if a file is present in <foo> but not in <bar> it will be ignored.
Roughly equivalent to: rsync -a --link-dest=<foo> <bar> <baz>.
Control which metadata is preserved with --preserve-settings:
# preserve nothing (directories get default mode, no uid/gid/time)
> rlink <foo> <bar> --preserve-settings=none
# custom: preserve uid, gid, and time on files and dirs
> rlink <foo> <bar> --preserve-settings="f:uid,gid,time,0777 d:uid,gid,time,0777 l:uid,gid,time"
Hard-linked files always share metadata with their source via the inode -- preserve settings affect directories and symlinks in all modes, and additionally files that are copied (not linked) during --update operations. By default rlink uses --preserve-settings=all.
When using --update with --preserve-settings that does not cover all compared attributes (e.g. --preserve-settings=none while --update-compare=size,mtime), rlink will error to prevent silent data integrity issues. Use --allow-lossy-update to override.
Compare <foo> vs. <bar>:
# differences are printed to stdout by default
> rcmp <foo> <bar> --progress --summary
# use --log to write differences to a file instead
> rcmp <foo> <bar> --progress --summary --log compare.log
# use --quiet to suppress stdout output (exit code only)
> rcmp <foo> <bar> --quiet
# --quiet with --log: silent stdout but differences still written to file
> rcmp <foo> <bar> --quiet --log compare.log
Installation
nixpkgs
All tools are available via nixpkgs under rcp package name.
The following command will install all the tools on your system:
> nix-env -iA nixpkgs.rcp
crates.io
All tools are available on crates.io. Individual tools can be installed using cargo install:
> cargo install rcp-tools-rcp
debian / rhel
Starting with release v0.10.1, .deb and .rpm packages are available as part of each release.
Static musl builds
The repository is configured to build static musl binaries by default via .cargo/config.toml. Simply run cargo build or cargo build --release to produce fully static binaries. To build glibc binaries instead, use cargo build --target x86_64-unknown-linux-gnu.
For development, enter the nix environment (nix develop) to get all required tools including the musl toolchain. Outside nix shell, install the musl target with rustup target add x86_64-unknown-linux-musl and ensure you have musl-tools installed (e.g., apt-get install musl-tools on Ubuntu/Debian).
General controls
Copy semantics
The copy semantics for RCP tools differ slightly from how e.g. the cp tool works. This is because of the ambiguity in the result of a cp operation that we wanted to avoid.
Specifically, the result of cp foo/x bar/x depends on bar/x being a directory. If so, the resulting path will be bar/x/x (which is usually undesired), otherwise it will be bar/x.
To avoid this confusion, RCP tools:
- will NOT overwrite data by default (use
--overwriteto change) - do assume that a path WITHOUT a trailing slash is the final name of the destination and
- path ending in slash is a directory into which we want to copy the sources (without renaming)
The following examples illustrate this (those rules apply to both rcp and rlink):
rcp A/B C/D- copyA/BintoC/and name itD; ifC/Dexists fail immediatelyrcp A/B C/D/- copyBintoDWITHOUT renaming i.e., the resulting path will beC/D/B; ifC/B/Dexists fail immediately
Using rcp it's also possible to copy multiple sources into a single destination, but the destination MUST have a trailing slash (/):
rcp A B C D/- copyA,BandCintoDWITHOUT renaming i.e., the resulting paths will beD/A,D/BandD/C; if any of which exist fail immediately
Throttling
-
set
--ops-throttleto limit the maximum number of operations per second- useful if you want to avoid interfering with other work on the storage / host
-
set
--iops-throttleto limit the maximum number of I/O operations per second- MUST be used with
--chunk-size, which is used to calculate I/O operations per file
- MUST be used with
-
set
--max-open-filesto limit the maximum number of open files- RCP tools will automatically adjust the maximum based on the system limits however, this setting can be used if there are additional constraints
Error handling
rcptools will log non-terminal errors and continue by default- to fail immediately on any error use the
--fail-earlyflag
Remote copy configuration
When using remote paths (host:/path syntax), rcp automatically starts rcpd daemons on remote hosts via SSH.
Requirements:
- SSH access to remote hosts (uses your SSH config and keys)
rcpdbinary available on remote hosts (see Auto-deployment below for automatic setup)
Auto-deployment:
Starting with v0.22.0, rcp can automatically deploy rcpd to remote hosts using the --auto-deploy-rcpd flag. This eliminates the need to manually install rcpd on each remote host.
# automatic deployment - no manual setup required
> rcp --auto-deploy-rcpd host1:/source host2:/dest --progress
When auto-deployment is enabled:
rcpfinds the localrcpdbinary (same directory or PATH)- Deploys it to
~/.cache/rcp/bin/rcpd-{version}on remote hosts via SSH - Verifies integrity using SHA-256 checksums
- Keeps the last 3 versions and cleans up older ones
- Reuses deployed binaries for subsequent operations (cached until version changes)
Manual deployment is still supported and may be preferred for:
- Air-gapped environments where auto-deployment is not feasible
- Production systems with strict change control
- Situations where you want to verify the binary before deployment
Configuration options:
--port-ranges- restrict TCP data ports to specific ranges (e.g., "8000-8999")--remote-copy-conn-timeout-sec- connection timeout in seconds (default: 15)
Architecture: The remote copy uses a three-node architecture:
- Master (
rcp) orchestrates the copy operation - Source
rcpdreads files from source host - Destination
rcpdwrites files to destination host - Data flows directly from source to destination (not through master)
For detailed network connectivity and troubleshooting information, see docs/remote_copy.md.
Security
Remote copy uses SSH for authentication and TLS 1.3 for encrypted data transfer.
Security Model:
- SSH Authentication: All remote operations require SSH authentication first
- TLS Encryption: Data transfers are encrypted by default using TLS 1.3 with certificate pinning
- Mutual Authentication: Both source and destination verify each other's certificates
What's Protected:
- ✅ Unauthorized access (SSH authentication required)
- ✅ Data encryption (TLS 1.3 with AES-256-GCM)
- ✅ Man-in-the-middle attacks (certificate fingerprint verification)
Performance Option:
For trusted networks where encryption overhead is undesirable, use --no-encryption:
> rcp --no-encryption source:/path dest:/path
Warning: This disables both encryption and authentication on the data path.
Best Practices:
- Use SSH key-based authentication
- Keep encryption enabled (default) for sensitive data
- Only use
--no-encryptionon isolated, trusted networks
For detailed security architecture and threat model, see docs/security.md.
Terminal output
Log messages
- sent to
stdout - by default only errors are logged
- verbosity controlled using
-v/-vv/-vvvfor INFO/DEBUG/TRACE and-q/--quietto disable
Progress
- sent to
stderr(bothProgressBarandTextUpdates) - by default disabled
- enabled using
-p/--progresswith optional--progress-type=...override
Summary
- sent to
stdout - by default disabled
- enabled using
--summary
Filtering
All tools support pattern-based filtering with --include and --exclude flags.
Pattern Syntax
*matches anything except/**matches anything including/(crosses directories)?matches a single character (except/)[...]character classes (e.g.,[abc],[0-9])- Leading
/anchors the pattern to the source root - Trailing
/matches only directories
Precedence
- Only
--exclude: include everything except matches - Only
--include: include only matches (exclude everything else) - Both: excludes are checked first, then includes
Examples
# Copy only .rs files
> rcp --include '*.rs' src/ dst/
# Copy everything except log files and the target directory
> rcp --exclude '*.log' --exclude 'target/' src/ dst/
# Copy only files in the src directory at root level
> rcp --include '/src/**' project/ backup/
# Use a filter file for complex patterns
> rcp --filter-file=filters.txt src/ dst/
Filter file format (filters.txt):
# Comments start with #
--include *.rs
--include Cargo.toml
--exclude target/
--exclude *.log
Pattern Semantics
Simple patterns (like *.txt, *_dir/) apply to all files at any level, including the source root itself.
Anchored patterns (starting with /, like /src/** or /bar/*.txt) match paths inside the source, not the source root itself. This allows you to copy a directory while filtering its contents:
# Copy project/ but only include the src subdirectory and its contents
> rcp --include '/src/**' project/ backup/
# Results in: backup/src/...
# Exclude the build directory inside the source
> rcp --exclude '/build/' project/ backup/
Note: /src matches only the src directory entry itself, while /src/** matches the directory and all files inside it.
Path patterns (containing / but not starting with /, like bar/*.txt) match relative paths inside the source directory.
Dry-Run Mode
Preview what would happen without making changes:
# Show only what would be copied
> rcp --dry-run=brief --exclude '*.log' src/ dst/
# Also show skipped files
> rcp --dry-run=all --exclude '*.log' src/ dst/
# Show skipped files with the pattern that caused the skip
> rcp --dry-run=explain --exclude '*.log' src/ dst/
Note: Dry-run mode is primarily useful for previewing --include/--exclude filtering. It bypasses --overwrite checks and does not check whether files already exist at the destination. --progress and --summary are suppressed in dry-run mode (use -v to still see summary output).
Overwrite
rcp tools will not-overwrite pre-existing data unless used with the --overwrite flag.
Performance Tuning
For maximum throughput, especially with remote copies over high-speed networks, consider these optimizations.
System-Level Tuning (Linux)
TCP Socket Buffers
rcp automatically requests larger TCP socket buffers for high-throughput transfers, but the kernel caps these to system limits. Increase the limits to allow full utilization of high-bandwidth links:
# Check current limits
# Increase to 16 MiB (requires root, temporary until reboot)
# Make permanent (add to /etc/sysctl.d/99-rcp-perf.conf)
The default is often insufficient for 10+ Gbps links.
Open File Limits
When copying large filesets with many concurrent operations, you may hit the open file limit:
# Check current limit
# Increase for current session
# Make permanent (add to /etc/security/limits.conf)
rcp automatically queries the system limit and uses --max-open-files to self-throttle, but higher limits allow more parallelism.
Network Backlog (10+ Gbps)
For very high-speed networks, increase the kernel's packet processing capacity:
Application-Level Tuning
Worker Threads
Control parallelism with --max-workers:
# Use all CPU cores (default)
# Limit to 4 workers (reduce I/O contention)
Remote Copy Buffer Size
For remote copies, the --remote-copy-buffer-size flag controls the size of data chunks sent over TCP:
# Larger buffers for high-bandwidth links (default: 16 MiB for datacenter)
# Smaller buffers for constrained memory
Network Profile
Use --network-profile to optimize for your network type:
# For datacenter/local networks (aggressive settings)
# For internet transfers (conservative settings)
Concurrent Connections
Control concurrent TCP connections for file transfers (default: 100):
# Increase for many small files on high-bandwidth links
# Decrease to reduce resource usage
Diagnosing Performance Issues
Check TCP Buffer Sizes
When rcp starts, it logs the actual buffer sizes achieved (visible with -v). If the actual sizes are much smaller than requested, increase your system's rmem_max/wmem_max.
Check for Network Issues
# Network interface drops
| |
# TCP retransmissions (high values indicate network congestion)
|
Quick Checklist
For optimal performance on high-speed networks:
- ☐ Increase
rmem_max/wmem_maxto 16+ MiB - ☐ Increase
ulimit -nif copying many files - ☐ Use
--network-profile=datacenterfor local/datacenter networks - ☐ Use
--progressto monitor throughput in real-time - ☐ Check
-voutput to verify buffer sizes and connection setup
filegen Performance
The filegen tool generates random test data, which is CPU-intensive. Unlike other rcp tools that are typically I/O-bound, filegen's bottleneck is often the CPU generating random bytes.
Default behavior: filegen defaults --max-open-files to the number of physical CPU cores, rather than 80% of the system's open file limit used by other tools. This matches concurrency to compute capacity, avoiding excessive parallelism that would cause CPU contention.
Tuning for your workload:
# Use default (physical cores) - optimal for fast storage
# Increase for slow storage where I/O latency dominates
# No limit (unlimited concurrency)
Profiling
rcp supports several profiling and debugging options.
Chrome Tracing
Produces JSON trace files viewable in Perfetto UI or chrome://tracing.
# Profile a local copy
# Profile a remote copy (traces produced on all hosts)
Output files are named: {prefix}-{identifier}-{hostname}-{pid}-{timestamp}.json
Example output:
/tmp/trace-rcp-master-myhost-12345-2025-01-15T10:30:45.json/tmp/trace-rcpd-source-host1-23456-2025-01-15T10:30:46.json/tmp/trace-rcpd-destination-host2-34567-2025-01-15T10:30:46.json
View traces by opening https://ui.perfetto.dev and dragging the JSON file into the browser.
Flamegraph
Produces folded stack files convertible to SVG flamegraphs using inferno.
# Profile and generate flamegraph data
# Convert to SVG (requires: cargo install inferno)
|
# Or use inferno-flamechart to preserve chronological order
|
Output files are named: {prefix}-{identifier}-{hostname}-{pid}-{timestamp}.folded
Profile Level
Control which spans are captured with --profile-level (default: trace):
# Capture only info-level and above spans
Only spans from rcp crates are captured (not tokio internals).
Tokio Console
Enable tokio-console for real-time async task inspection:
# Start rcp with tokio-console enabled
# Or specify a custom port
# Connect with tokio-console CLI
Trace events are retained for 60s by default. This can be modified with RCP_TOKIO_TRACING_CONSOLE_RETENTION_SECONDS=120.
Combined profiling
All profiling options can be used together: