hunkpick
Non-interactive unified-diff hunk picker and splitter — a pure stdin→stdout filter for staging subsets of changes without interactive prompts.
Table of Contents
- Why / Motivation
- Installation
- Usage
- Selectors
- Verification
- Input handling
- Auto-split and non-overlap
- Exit codes
- Comparison to filterdiff
- Development
- License
Why / Motivation
The standard non-interactive approach for staging a subset of hunks uses
filterdiff from the
patchutils suite:
| |
filterdiff works at the granularity of whole hunks as they appear in the diff.
If a single hunk contains multiple independent change runs separated by context
lines, filterdiff cannot address them individually — the entire hunk is either
included or excluded.
hunkpick fills this gap:
- Auto-split: each hunk is automatically decomposed into minimal sub-hunks, one per contiguous change run. The resulting sub-hunks are addressable individually by a stable 1-based per-file index.
- Per-file addressing: selectors use
path:1,3syntax, which is unambiguous in multi-file diffs and composable in scripts. - Built-in verification: the result diff is checked for internal consistency by
default; an optional
git apply --checkrun is available on demand. - Git-agnostic:
hunkpickreads a diff from stdin and writes to stdout. It does not callgit diffitself and works with any diff source (git, Mercurial, SVN, or plaindiff -uoutput). Application to the index is left to the caller viagit apply --cached. - Encoding-agnostic: the diff is processed as raw bytes end to end. Content in any
encoding — including invalid UTF-8 — round-trips byte-for-byte; only the path and
preview shown by
listare decoded lossily for display. - Cross-platform, including Windows:
filterdiff/patchutilsis a Unix toolchain that is awkward to obtain and run on Windows.hunkpickis a single self-contained binary built for Linux, macOS, and Windows (x86_64-pc-windows-msvc), with no runtime dependencies. - AI-agent integration: the first consumer is an automated coding agent. Staging a
precise subset of a diff programmatically needs non-interactive operation (no
git add -pprompts), a stable machine-readable--jsonlisting, deterministic per-file sub-hunk addressing, and structured exit codes — none of which the interactivegit add -por the whole-hunk-onlyfilterdiffprovides.
Installation
From crates.io:
Prebuilt binary via cargo-binstall (downloads the release artifact from GitHub instead of compiling):
Prebuilt binaries are published for x86_64-unknown-linux-gnu, aarch64-apple-darwin, x86_64-apple-darwin, and x86_64-pc-windows-msvc. On other targets cargo binstall falls back to a source build.
From source:
# binary is at target/release/hunkpick
Minimum supported Rust version: 1.85.
Usage
All subcommands read a unified diff from stdin by default and write to stdout.
Use -i, --input FILE to read from a file instead (- means stdin). See
Input handling for the size limit.
list
Parse the diff, auto-split each hunk into minimal sub-hunks, and list them per file with their 1-based per-file index.
# Human-readable output (default)
|
# Machine-readable JSON
|
# Control colorisation
|
Example human output:
src/main.rs
[1] @@ -10,4 +10,4 @@ +1 -1 +let x = 1;
[2] @@ -20,6 +20,6 @@ +1 -1 +fn bar() {
JSON schema (--json): array of file objects, each with path, binary, and
hunks (array of sub-hunk objects with index, old_start, old_lines,
new_start, new_lines, added, deleted, header, preview).
Binary files are listed with "binary": true and an empty hunks array.
select
Emit only the chosen sub-hunks as a valid unified diff.
# Select sub-hunks 1 and 3 from a single-file diff
| |
# Select sub-hunks from specific files in a multi-file diff
| |
# Select a range
| |
A binary file referenced by any selector index is emitted whole.
split
Split one original hunk (addressed by its 1-based index over the file's original hunks, before auto-splitting) at specified new-file line numbers. The line numbers must fall on context lines. The result is the complete patch with that hunk replaced by the pieces.
# Split original hunk 1 in a single-file diff at new-file line 5
|
# Same for a named file in a multi-file diff
|
# With git verification
|
Staging recipe
# 1. Inspect what sub-hunks are available
|
# 2. Stage only sub-hunks 1 and 3
| |
Selectors
Selectors are passed as positional arguments to select. Each selector addresses
sub-hunks within one file by their 1-based per-file index as reported by list.
| Form | Meaning |
|---|---|
1,3 |
Sub-hunks 1 and 3 (bare list, only for single-file diffs) |
2-4 |
Sub-hunks 2, 3, and 4 (bare range, single-file only) |
src/foo.rs:1,3 |
Sub-hunks 1 and 3 within src/foo.rs |
src/foo.rs:2-4 |
Sub-hunks 2 through 4 within src/foo.rs |
Multiple selectors can be combined: hunkpick select src/a.rs:1 src/b.rs:2,3.
Path matching checks both the old and new path of a file diff entry. A bare index
list (no path: prefix) is accepted only when the diff contains exactly one file;
otherwise hunkpick exits with code 2.
For the split subcommand the hunk address uses the same path:N / N form, but
N refers to the 1-based index over the file's original hunks (not auto-split
sub-hunks).
Verification
Internal consistency check (default)
After select or split, hunkpick verifies the result diff for internal
consistency: @@ header counts match the body line counts, hunks within each file
are ordered, and their old-file ranges do not overlap. This check runs by default and
requires no git repository.
To disable it:
|
Git apply check (optional)
Pass --verify-result-diff-git to additionally run git apply --check on the result
diff before emitting it. This confirms the diff applies cleanly to the working tree.
|
Use -C <DIR> to specify the working tree directory (default: current directory).
-C requires --verify-result-diff-git; passing -C alone is a usage error.
|
Verification failure
On any verification failure, hunkpick writes a diagnostic to stderr, writes
nothing to stdout, and exits with code 70.
Input handling
Source
By default the diff is read from stdin. -i, --input FILE reads from a file instead;
-i - is an explicit stdin. The flag is available on every subcommand and may appear
after it:
|
|
Size limit
Input (from stdin or a file) is capped at 64 MiB by default to guard against an
accidentally unbounded stream. Exceeding the limit is a usage error (exit code 2).
Override with --max-input-bytes N; 0 disables the limit.
Note: the working-set memory is several times the input size (the input buffer, the parsed model, and the emitted diff coexist), so a 64 MiB input corresponds to a few hundred MiB of peak RAM. Lower the limit if you run in a memory-constrained environment.
Validation
hunkpick reads the input as raw bytes and validates it before parsing:
- Empty or whitespace-only input is a no-op: nothing is written and the exit code is 0, for every subcommand.
- Binary input (any NUL byte) is rejected with a diagnostic and exit code 2.
- Text with no diff marker (no line starting with
diff --git,---,+++,@@, orBinary files) is rejected with exit code 2.
Valid diff content is never decoded as UTF-8 internally, so lines in any byte encoding (or with invalid UTF-8) pass through unchanged.
Auto-split and non-overlap
hunkpick decomposes each hunk into sub-hunks automatically at boundaries between
adjacent change runs. A "change run" is a maximal contiguous sequence of +/-
lines. Context lines between change runs become the split boundary.
Non-overlap guarantee: sub-hunk old-file ranges are strictly non-overlapping. The boundary context (lines between two change runs) becomes the trailing context of the earlier sub-hunk. The later sub-hunk starts directly at its change run, with no leading copy of the boundary context.
This differs from git add -p, which can share context between adjacent hunks
because it applies each hunk individually. hunkpick select emits all selected
sub-hunks as a single combined patch applied in one git apply call; overlapping
old-file ranges would cause git apply to reject the patch.
Round-trip property: selecting all sub-hunks for a file produces a diff that applies equivalently to the original hunk. The output is not byte-identical to the original (one hunk becomes several), but the applied result is the same.
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 2 | Usage error: bad flag, bad selector, parse error, binary/non-diff input, input over size limit |
| 70 | Verification failure (internal consistency or git apply --check) |
| 74 | I/O error (reading stdin or writing stdout) |
| 130 | Interrupted (SIGINT or SIGTERM, default signal disposition) |
Comparison to filterdiff
| Capability | filterdiff | hunkpick |
|---|---|---|
| Select whole hunks from a diff | yes | yes |
| Auto-split hunks at change-run boundaries | no | yes |
| Address sub-hunks by per-file index | no | yes |
| Explicit hunk split at a named line | no | yes |
| Machine-readable listing (JSON) | no | yes |
| Works with any diff source (not git-specific) | yes | yes |
| Built-in result verification | no | yes |
| Binary file pass-through | yes | yes |
Development
Contributions are welcome. The crate has no build-time code generation and no external runtime dependencies, so the standard cargo workflow applies.
# Run the full test suite (unit + integration + doc tests).
# Lint with all warnings denied (the CI gate).
# Check formatting (CI verifies this; use `cargo fmt --all` to apply).
# Verify the code still builds on the minimum supported Rust version (1.85).
The CI workflow (.github/workflows/ci.yml) runs the same
checks, using cargo-nextest for the unit/integration tests and
cargo test --doc for doc tests. Test runner limits (per-test timeout and thread count)
live in .config/nextest.toml; please keep tests fast and
hermetic — several tests shell out to git apply --check and require git on PATH.
License
MIT. See LICENSE.