RCP TOOLS
This repo contains tools to efficiently copy, remove and link large filesets, both locally and across remote hosts.
-
rcpis for copying files; similar tocpbut generally MUCH faster when dealing with large filesets.Supports both local and remote copying using
host:/pathsyntax (similar toscp).Inspired by tools like
dsync(1) andpcp(2). -
rrmis for removing large filesets. -
rlinkallows hard-linking filesets with optional update path; typically used for hard-linking datasets with a delta. -
rcmptool is for comparing filesets. -
filegentool generates sample filesets, useful for testing.
Documentation
API documentation for the command-line tools is available on docs.rs:
- rcp-tools-rcp - File copying tool (rcp & rcpd)
- rcp-tools-rrm - File removal tool
- rcp-tools-rlink - Hard-linking tool
- rcp-tools-rcmp - File comparison tool
- rcp-tools-filegen - Test file generation utility
For contributors: Internal library crates used by the tools above:
- rcp-tools-common - Shared utilities and types
- rcp-tools-remote - Remote operation protocol
- rcp-tools-throttle - Resource throttling
Examples
Basic local copy with progress-bar and summary at the end:
> rcp <foo> <bar> --progress --summary
Copy while preserving metadata, overwrite/update destination if it already exists:
> rcp <foo> <bar> --preserve --progress --summary --overwrite
Remote copy from one host to another:
> rcp user@host1:/path/to/source user@host2:/path/to/dest --progress --summary
Copies files from host1 to host2. The rcpd process is automatically started on both hosts via SSH.
Copy from remote host to local machine:
> rcp host:/remote/path /local/path --progress --summary
Copy from local machine to remote host and preserve metadata:
> rcp /local/path host:/remote/path --progress --summary --preserve
Log tool output to a file while using progress bar:
> rcp <foo> <bar> --progress --summary > copy.log
Progress bar is sent to stderr while log messages go to stdout. This allows us to pipe stdout to a file to preserve the tool output while still viewing the interactive progress bar. This works for all RCP tools.
Remove a path recursively:
> rrm <bar> --progress --summary
Hard-link contents of one path to another:
> rlink <foo> <bar> --progress --summary
Roughly equivalent to: cp -p --link <foo> <bar>.
Hard-link contents of <foo> to <baz> if they are identical to <bar>:
> rlink <foo> --update <bar> <baz> --update-exclusive --progress --summary
Using --update-exclusive means that if a file is present in <foo> but not in <bar> it will be ignored.
Roughly equivalent to: rsync -a --link-dest=<foo> <bar> <baz>.
Compare <foo> vs. <bar>:
> rcmp <foo> <bar> --progress --summary --log compare.log
Installation
nixpkgs
All tools are available via nixpkgs under rcp package name.
The following command will install all the tools on your system:
> nix-env -iA nixpkgs.rcp
crates.io
All tools are available on crates.io. Individual tools can be installed using cargo install:
> cargo install rcp-tools-rcp
debian / rhel
Starting with release v0.10.1, .deb and .rpm packages are available as part of each release.
General controls
Copy semantics
The copy semantics for RCP tools differ slightly from how e.g. the cp tool works. This is because of the ambiguity in the result of a cp operation that we wanted to avoid.
Specifically, the result of cp foo/x bar/x depends on bar/x being a directory. If so, the resulting path will be bar/x/x (which is usually undesired), otherwise it will be bar/x.
To avoid this confusion, RCP tools:
- will NOT overwrite data by default (use
--overwriteto change) - do assume that a path WITHOUT a trailing slash is the final name of the destination and
- path ending in slash is a directory into which we want to copy the sources (without renaming)
The following examples illustrate this (those rules apply to both rcp and rlink):
rcp A/B C/D- copyA/BintoC/and name itD; ifC/Dexists fail immediatelyrcp A/B C/D/- copyBintoDWITHOUT renaming i.e., the resulting path will beC/D/B; ifC/B/Dexists fail immediately
Using rcp it's also possible to copy multiple sources into a single destination, but the destination MUST have a trailing slash (/):
rcp A B C D/- copyA,BandCintoDWITHOUT renaming i.e., the resulting paths will beD/A,D/BandD/C; if any of which exist fail immediately
Throttling
-
set
--ops-throttleto limit the maximum number of operations per second- useful if you want to avoid interfering with other work on the storage / host
-
set
--iops-throttleto limit the maximum number of I/O operations per second- MUST be used with
--chunk-size, which is used to calculate I/O operations per file
- MUST be used with
-
set
--max-open-filesto limit the maximum number of open files- RCP tools will automatically adjust the maximum based on the system limits however, this setting can be used if there are additional constraints
Error handling
rcptools will log non-terminal errors and continue by default- to fail immediately on any error use the
--fail-earlyflag
Remote copy configuration
When using remote paths (host:/path syntax), rcp automatically starts rcpd daemons on remote hosts via SSH.
Requirements:
- SSH access to remote hosts (uses your SSH config and keys)
rcpdbinary must be available in the same directory asrcpon remote hosts
Configuration options:
--quic-port-ranges- restrict QUIC to specific port ranges (e.g., "8000-8999")--remote-copy-conn-timeout-sec- connection timeout in seconds (default: 15)
Architecture: The remote copy uses a three-node architecture with QUIC protocol:
- Master (
rcp) orchestrates the copy operation - Source
rcpdreads files from source host - Destination
rcpdwrites files to destination host - Data flows directly from source to destination (not through master)
For detailed network connectivity and troubleshooting information, see docs/network_connectivity.md.
Security
Remote copy operations are secured against man-in-the-middle (MITM) attacks using a combination of SSH authentication and certificate pinning.
Security Model:
- SSH Authentication: All remote operations require SSH authentication first
- TLS 1.3 Encryption: Data transfer uses QUIC with TLS 1.3 for encryption
- Certificate Pinning: SHA-256 fingerprints prevent endpoint impersonation
- No Configuration Required: Security features work automatically
How It Works:
- SSH authenticates and launches
rcpdon remote hosts - Certificate fingerprints are transmitted via the secure SSH channel
- QUIC connections validate certificates against these fingerprints
- Connections fail if fingerprints don't match (MITM detected)
What's Protected:
- ✅ Man-in-the-middle attacks
- ✅ Eavesdropping (all data encrypted)
- ✅ Data tampering (cryptographic integrity)
- ✅ Connection hijacking
- ✅ Unauthorized access (SSH authentication required)
Trust Model:
- SSH is the root of trust (use SSH best practices)
- Certificate fingerprints are ephemeral (generated per session)
- No PKI or long-term certificate management needed
For detailed security architecture and threat model, see docs/security.md.
Terminal output
Log messages
- sent to
stdout - by default only errors are logged
- verbosity controlled using
-v/-vv/-vvvfor INFO/DEBUG/TRACE and-q/--quietto disable
Progress
- sent to
stderr(bothProgressBarandTextUpdates) - by default disabled
- enabled using
-p/--progresswith optional--progress-type=...override
Summary
- sent to
stdout - by default disabled
- enabled using
--summary
Overwrite
rcp tools will not-overwrite pre-existing data unless used with the --overwrite flag.
Tracing and tokio-console
The rcp tools now use the tracing crate for logging and support sending data to the tokio-console subscriber.
Enabling
To enable the console-subscriber you need to set the environment variable RCP_TOKIO_TRACING_CONSOLE_ENABLED=1 (or true with any case).
Server port
By default port 6669 is used (tokio-console default) but this can be changed by setting RCP_TOKIO_TRACING_CONSOLE_SERVER_PORT=1234.
Retention time
The trace events are retained for 60s. This can be modified by setting RCP_TOKIO_TRACING_CONSOLE_RETENTION_SECONDS=120.