hunkpick 0.3.1

Non-interactive unified-diff hunk picker and splitter
Documentation

hunkpick

crates.io docs.rs CI license

Non-interactive unified-diff hunk picker and splitter — a pure stdin→stdout filter for staging subsets of changes without interactive prompts.

Table of Contents

Why / Motivation

The standard non-interactive approach for staging a subset of hunks uses filterdiff from the patchutils suite:

git diff path | filterdiff --hunks=1,3 | git apply --cached

filterdiff works at the granularity of whole hunks as they appear in the diff. If a single hunk contains multiple independent change runs separated by context lines, filterdiff cannot address them individually — the entire hunk is either included or excluded.

hunkpick fills this gap:

  • Auto-split: each hunk is automatically decomposed into minimal sub-hunks, one per contiguous change run. The resulting sub-hunks are addressable individually by a stable 1-based per-file index.
  • Per-file addressing: selectors use path:1,3 syntax, which is unambiguous in multi-file diffs and composable in scripts. A * selects every sub-hunk of a file.
  • Content ids: each sub-hunk also carries a content-derived @<id>. It hashes only the file paths and the sub-hunk's changed (+/-) lines — not its context or the @@ line numbers — so the id stays the same across a re-diff even when an edit elsewhere shifts its line numbers or staging a neighbour rewrites its surrounding context. An agent can capture @<id> once and keep using it across a staging loop. (Byte-identical changes share an id; list --json reports id_count. See Content ids.)
  • Built-in verification: the result diff is checked for internal consistency by default; an optional git apply --check run is available on demand.
  • Git-agnostic: hunkpick reads a diff from stdin and writes to stdout. It does not call git diff itself and works with any diff source (git, Mercurial, SVN, or plain diff -u output). Application to the index is left to the caller via git apply --cached.
  • Encoding-agnostic: the diff is processed as raw bytes end to end. Content in any encoding — including invalid UTF-8 — round-trips byte-for-byte; only the path and preview shown by list are decoded lossily for display.
  • Cross-platform, including Windows: filterdiff/patchutils is a Unix toolchain that is awkward to obtain and run on Windows. hunkpick is a single self-contained binary built for Linux, macOS, and Windows (x86_64-pc-windows-msvc), with no runtime dependencies.
  • AI-agent integration: the first consumer is an automated coding agent. Staging a precise subset of a diff programmatically needs non-interactive operation (no git add -p prompts), a stable machine-readable --json listing, deterministic per-file sub-hunk addressing, and structured exit codes — none of which the interactive git add -p or the whole-hunk-only filterdiff provides.

Installation

From crates.io:

cargo install hunkpick

Prebuilt binary via cargo-binstall (downloads the release artifact from GitHub instead of compiling):

cargo binstall hunkpick

Prebuilt binaries are published for x86_64-unknown-linux-gnu, aarch64-apple-darwin, x86_64-apple-darwin, and x86_64-pc-windows-msvc. On other targets cargo binstall falls back to a source build.

From source:

git clone https://github.com/VitalyOstanin/hunkpick.git
cd hunkpick
cargo build --release
# binary is at target/release/hunkpick

Minimum supported Rust version: 1.85.

Usage

All subcommands read a unified diff from stdin by default and write to stdout. Use -i, --input FILE to read from a file instead (- means stdin). See Input handling for the size limit.

list

Parse the diff, auto-split each hunk into minimal sub-hunks, and list them per file with their 1-based per-file index.

# Human-readable output (default)
git diff src/main.rs | hunkpick list

# Machine-readable JSON
git diff src/main.rs | hunkpick list --json

# Control colorisation
git diff src/main.rs | hunkpick list --color always

Example human output:

src/main.rs
  [1] 114ccaaa7ce6c0f1 @@ -10,4 +10,4 @@  +1 -1  +let x = 1;
  [2] 8002dd73f0dfd2f4 @@ -20,6 +20,6 @@  +1 -1  +fn bar() {

The 16-hex token after the index is the sub-hunk's content id (see Selectors).

JSON schema (--json): array of file objects, each with path, binary, and hunks (array of sub-hunk objects with index, id, id_count, old_start, old_lines, new_start, new_lines, added, deleted, header, preview). id_count is how many sub-hunks across the whole patch share that id (1 = unique).

Binary files are listed with "binary": true and an empty hunks array.

select

Emit only the chosen sub-hunks as a valid unified diff.

# Select sub-hunks 1 and 3 from a single-file diff
git diff src/main.rs | hunkpick select 1,3 | git apply --cached

# Select sub-hunks from specific files in a multi-file diff
git diff | hunkpick select src/main.rs:1,3 src/lib.rs:2 | git apply --cached

# Same when the diff is taken over an explicit file list (git diff file1 file2 fileN).
# With more than one file, every selector must carry a path: prefix (a bare index is
# only allowed for a single-file diff).
git diff src/a.rs src/b.rs src/c.rs | hunkpick select src/a.rs:1,3 src/c.rs:2-4 | git apply --cached

# Select a range
git diff path | hunkpick select path:2-4 | git apply --cached

# Select every sub-hunk of a file (or the whole single-file diff)
git diff | hunkpick select src/main.rs:* | git apply --cached
git diff src/main.rs | hunkpick select '*' | git apply --cached

# Select by content id (from `list --json`), stable across re-diffs
git diff | hunkpick select @8002dd73f0dfd2f4 | git apply --cached

# Content ids work across a multi-file diff too: the file path is part of the id, so
# an id addresses the change in its own file (the same edit elsewhere gets another id).
git diff src/a.rs src/b.rs src/c.rs | hunkpick select @8002dd73f0dfd2f4 | git apply --cached

# Several ids at once, mixed with path: selectors. Read the ids from `list --json` first
# (the machine-readable form, intended for tooling):
git diff | hunkpick list --json
git diff | hunkpick select @8002dd73f0dfd2f4 @bf7bdaaf30c1e2d4 src/lib.rs:2 | git apply --cached

A binary file referenced by any selector index is emitted whole.

split

Split one original hunk (addressed by its 1-based index over the file's original hunks, before auto-splitting) at specified new-file line numbers. The line numbers must fall on context lines. The result is the complete patch with that hunk replaced by the pieces.

# Split original hunk 1 in a single-file diff at new-file line 5
git diff src/lib.rs | hunkpick split 1 --at 5

# Same for a named file in a multi-file diff
git diff | hunkpick split src/lib.rs:1 --at 5,12

# With git verification
git diff src/lib.rs | hunkpick split 1 --at 5 --verify-result-diff-git -C /path/to/repo

Staging recipe

# 1. Inspect what sub-hunks are available
git diff path/to/file.rs | hunkpick list --json

# 2. Stage only sub-hunks 1 and 3
git diff path/to/file.rs | hunkpick select 1,3 | git apply --cached

Splitting one file's mixed changes into several semantic commits, addressing sub-hunks by content id. Bare indices renumber after each staging, but a @<id> stays valid across the re-diff (see Content ids), so the listing is captured once and never re-read:

# 1. Capture the ids once. `id_count` flags any id that selects more than one.
git diff src/indicator.js | hunkpick list --json

# 2. Stage and commit each group by @id (one or more ids each), re-running git
#    diff each round. The ids from step 1 remain valid even though staging
#    renumbers the bare indices.
git diff src/indicator.js | hunkpick select @bf7bdaaf30c1e2d4 | git apply --cached
git commit -m "fix: ..."

git diff src/indicator.js | hunkpick select @058b36528575a870 @399e1cd421e268cc | git apply --cached
git commit -m "feat: ..."

# 3. Whatever is left is the last group; `*` takes the remaining sub-hunks.
git diff src/indicator.js | hunkpick select '*' | git apply --cached
git commit -m "chore: ..."

Selectors

Selectors are passed as positional arguments to select. Each selector addresses sub-hunks within one file by their 1-based per-file index as reported by list.

Form Meaning
1,3 Sub-hunks 1 and 3 (bare list, only for single-file diffs)
2-4 Sub-hunks 2, 3, and 4 (bare range, single-file only)
* Every sub-hunk (bare *, single-file only)
src/foo.rs:1,3 Sub-hunks 1 and 3 within src/foo.rs
src/foo.rs:2-4 Sub-hunks 2 through 4 within src/foo.rs
src/foo.rs:* Every sub-hunk of src/foo.rs
@<id> Every sub-hunk whose content id equals <id>

Multiple selectors can be combined: hunkpick select src/a.rs:1 src/b.rs:2,3.

Path matching checks both the old and new path of a file diff entry. A bare index list or * (no path: prefix) is accepted only when the diff contains exactly one file; otherwise hunkpick exits with code 2.

Selectors are matched in order of precedence: a path:set form is recognised first (so a file literally named @foo is still addressable as @foo:1), then @id, then a bare set.

Content ids

list reports a 16-hex content id for every sub-hunk, also accepted by select as @<id>. The id is a hash of the file paths and the sub-hunk's changed (+/-) lines onlynot its context lines, the @@ line numbers, or the section header. Ids are matched case-insensitively. Because the file path is part of the hash, ids work across a multi-file diff: an @<id> addresses the change in its own file, and the same edit applied to a different file gets a different id.

Because only the changed lines feed the id, it is stable across a re-diff in every common case of an iterative diff → stage → re-diff loop:

  • An unrelated edit elsewhere that only shifts this change's line numbers leaves its id unchanged.
  • Staging a neighbouring sub-hunk — which rewrites this change's surrounding context, or causes the enclosing hunk to be re-split — also leaves its id unchanged, because the context is not part of the id.

So positional indices renumber as you stage changes, but a change's @<id> does not: capture it once from list and keep using it across the loop without re-reading the listing. The id changes only when the change's own +/- lines change.

Because context is excluded, two changes with identical +/- lines share an id even if their surrounding context differs; @<id> then selects all of them. list --json reports id_count (how many sub-hunks share the id), so a consumer can tell up front whether @<id> is unique (id_count == 1) or would select several; to address just one of several identical changes, use path:N. If an id is ever shared by sub-hunks whose changed lines actually differ (an accidental hash collision), select reports it and exits with code 2 — address those by path:N.

For the split subcommand the hunk address uses the same path:N / N form, but N refers to the 1-based index over the file's original hunks (not auto-split sub-hunks). split does not accept * or @id.

Splitting an addition-only block: INDEX@RANGE

A sub-hunk that is all additions — a block of new functions appended to a file, or a file-creation diff (@@ -0,0 +1,N @@) — is one atomic sub-hunk: auto-split has no context line inside it to cut at. To stage part of such a block, address it with a per-line range:

[path:]INDEX@RANGE

INDEX is the 1-based sub-hunk index from list. Only a numeric index may precede @ — content ids (@id) and * are not accepted here. RANGE numbers the sub-hunk's added (+) lines, 1-based:

Form Meaning
lo-hi added lines lo through hi
lo- from lo to the last added line
-hi from the first added line to hi
N a single added line (N-N)

The cut is allowed only between two added lines; cutting where the boundary is a context or deletion line is an error. list marks freely-splittable sub-hunks (addition_only in --json, [+range] in the human listing).

Example — split a new file across two commits:

git diff src/lib.rs | hunkpick list                       # the block shows +N and the [+range] marker
git diff src/lib.rs | hunkpick select 1@1-90 | git apply --cached && git commit -m 'feat: part one'
git diff src/lib.rs | hunkpick select 1@91-  | git apply --cached && git commit -m 'feat: part two'

Verification

Internal consistency check (default)

After select or split, hunkpick verifies the result diff for internal consistency: @@ header counts match the body line counts, hunks within each file are ordered, and their old-file ranges do not overlap. This check runs by default and requires no git repository.

To disable it:

git diff path | hunkpick select 1 --no-verify-result-diff-internal

Git apply check (optional)

Pass --verify-result-diff-git to additionally run git apply --check on the result diff before emitting it. This confirms the diff applies cleanly to the working tree.

git diff path | hunkpick select 1 --verify-result-diff-git

Use -C <DIR> to specify the working tree directory (default: current directory). -C requires --verify-result-diff-git; passing -C alone is a usage error.

git diff path | hunkpick select 1 --verify-result-diff-git -C /path/to/repo

Verification failure

On any verification failure, hunkpick writes a diagnostic to stderr, writes nothing to stdout, and exits with code 70.

Input handling

Source

By default the diff is read from stdin. -i, --input FILE reads from a file instead; -i - is an explicit stdin. The flag is available on every subcommand and may appear after it:

hunkpick list --input changes.diff
hunkpick select 1,3 -i changes.diff | git apply --cached
git diff | hunkpick select 1,3            # stdin (default)

Size limit

Input (from stdin or a file) is capped at 64 MiB by default to guard against an accidentally unbounded stream. Exceeding the limit is a usage error (exit code 2). Override with --max-input-bytes N; 0 disables the limit.

hunkpick list --max-input-bytes 268435456 -i huge.diff   # raise to 256 MiB
hunkpick list --max-input-bytes 0 -i huge.diff           # no limit

Note: the working-set memory is several times the input size (the input buffer, the parsed model, and the emitted diff coexist), so a 64 MiB input corresponds to a few hundred MiB of peak RAM. Lower the limit if you run in a memory-constrained environment.

Validation

hunkpick reads the input as raw bytes and validates it before parsing:

  • Empty or whitespace-only input is a no-op: nothing is written and the exit code is 0, for every subcommand.
  • Binary input (any NUL byte) is rejected with a diagnostic and exit code 2.
  • Text with no diff marker (no line starting with diff --git , --- , +++ , @@ , or Binary files ) is rejected with exit code 2.

Valid diff content is never decoded as UTF-8 internally, so lines in any byte encoding (or with invalid UTF-8) pass through unchanged.

Auto-split and non-overlap

hunkpick decomposes each hunk into sub-hunks automatically at boundaries between adjacent change runs. A "change run" is a maximal contiguous sequence of +/- lines. Context lines between change runs become the split boundary.

Non-overlap guarantee: sub-hunk old-file ranges are strictly non-overlapping. The boundary context (lines between two change runs) becomes the trailing context of the earlier sub-hunk. The later sub-hunk starts directly at its change run, with no leading copy of the boundary context.

This differs from git add -p, which can share context between adjacent hunks because it applies each hunk individually. hunkpick select emits all selected sub-hunks as a single combined patch applied in one git apply call; overlapping old-file ranges would cause git apply to reject the patch.

Round-trip property: selecting all sub-hunks for a file produces a diff that applies equivalently to the original hunk. The output is not byte-identical to the original (one hunk becomes several), but the applied result is the same.

Exit codes

Code Meaning
0 Success
2 Usage error: bad flag, bad selector, parse error, binary/non-diff input, input over size limit
70 Verification failure (internal consistency or git apply --check)
74 I/O error (reading stdin or writing stdout)
130 Interrupted (SIGINT or SIGTERM, default signal disposition)

Comparison to filterdiff

Capability filterdiff hunkpick
Binary file pass-through
Select whole hunks from a diff
Works with any diff source (not git-specific)
Address sub-hunks by per-file index
Auto-split hunks at change-run boundaries
Built-in result verification
Explicit hunk split at a named line
Machine-readable listing (JSON)
Split an addition-only block by line range

Development

Contributions are welcome. The crate has no build-time code generation and no external runtime dependencies, so the standard cargo workflow applies.

# Run the full test suite (unit + integration + doc tests).
cargo test --all-features

# Lint with all warnings denied (the CI gate).
cargo clippy --all-targets --all-features -- -D warnings

# Check formatting (CI verifies this; use `cargo fmt --all` to apply).
cargo fmt --all --check

# Verify the code still builds on the minimum supported Rust version (1.85).
cargo +1.85 build --all-features

The CI workflow (.github/workflows/ci.yml) runs the same checks, using cargo-nextest for the unit/integration tests and cargo test --doc for doc tests. Test runner limits (per-test timeout and thread count) live in .config/nextest.toml; please keep tests fast and hermetic — several tests shell out to git apply --check and require git on PATH.

License

MIT. See LICENSE.