worktrunk 0.37.1

A CLI for Git worktree management, designed for parallel AI agent workflows
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
# Testing Guidelines

## Running `wt` Commands in Tests

**Use the correct helper to ensure test isolation.** Tests that spawn `wt` must
be isolated from the host environment to prevent:

- **Directive leakage**: Test commands writing to the user's shell directive file
- **Config pollution**: Tests reading/writing the user's real config
- **Git interference**: Host GIT_* environment variables affecting test behavior

### With a TestRepo fixture (most tests)

Use `repo.wt_command()` which returns a pre-configured Command:

```rust
// ✅ GOOD: Simple case
let output = repo.wt_command()
    .args(["switch", "--create", "feature"])
    .output()?;

// ✅ GOOD: With additional configuration (piped stdin, etc.)
let mut cmd = repo.wt_command();
cmd.args(["switch", "--create", "feature"])
    .stdin(Stdio::piped());
```

```rust
// ❌ BAD: Missing isolation - inherits host environment
let output = Command::new(env!("CARGO_BIN_EXE_wt"))
    .args(["switch", "--create", "feature"])
    .current_dir(repo.root_path())
    .output()?;
```

### Without a TestRepo (e.g., readme_sync tests)

Use the free function `wt_command()`:

```rust
use crate::common::wt_command;

// ✅ GOOD: Isolated from host environment
let output = wt_command()
    .args(["--help"])
    .current_dir(project_root)
    .output()?;
```

### Method reference

| Method | Returns | Use when |
|--------|---------|----------|
| `repo.wt_command()` | `Command` | Running wt commands with a TestRepo |
| `wt_command()` | `Command` | Running wt without a TestRepo (free function) |
| `repo.git_command()` | `Cmd` | Running git commands (use `.run()` not `.output()`) |

## Timing Tests: Long Timeouts with Fast Polling

**Core principle:** Use long timeouts (5+ seconds) for reliability on slow CI, but poll frequently (10-50ms) so tests complete quickly when things work.

This achieves both goals:
- **No flaky failures** on slow machines - generous timeout accommodates worst-case
- **Fast tests** on normal machines - frequent polling means no unnecessary waiting

```rust
// ✅ GOOD: Long timeout, fast polling
let timeout = Duration::from_secs(5);
let poll_interval = Duration::from_millis(10);
let start = Instant::now();
while start.elapsed() < timeout {
    if condition_met() { break; }
    thread::sleep(poll_interval);
}

// ❌ BAD: Fixed sleep (always slow, might still fail)
thread::sleep(Duration::from_millis(500));
assert!(condition_met());

// ❌ BAD: Short timeout (flaky on slow CI)
let timeout = Duration::from_millis(100);
```

Use the helpers in `tests/common/mod.rs`:

```rust
use crate::common::{wait_for_file, wait_for_file_count, wait_for_file_content};

// ✅ Poll for file existence (30-second default timeout)
wait_for_file(&log_file);

// ✅ Poll for multiple files
wait_for_file_count(&log_dir, "log", 3);

// ✅ Poll for file with non-empty content
wait_for_file_content(&marker_file);
```

These use exponential backoff (10ms → 500ms cap) for fast initial checks that back off on slow CI. The 30-second default timeout is generous enough to avoid flakiness under CI load.

### Event-driven code: drive the scenario from the callback

When the system under test exposes a callback, channel, or event hook, drive the scenario **causally** through that hook instead of racing wall-clock timers. The callback gives you a happens-before edge into the loop — use it to inject inputs and terminate the run, so the test's timing depends on the event ordering, not on CPU scheduling.

```rust
// ✅ GOOD: causally driven — first Stall event injects a result; a Stall
// observed after the result drops tx to end the drain via Disconnected.
// Runs at threshold speed on any hardware; the 5s deadline is only a
// safety net.
let mut sender = Some(tx);
let mut saw_result = false;
let outcome = drain_results_with_timings(
    rx, /* ... */,
    Instant::now() + Duration::from_secs(5),
    StallTimings { threshold: ms(20), tick: ms(10) },
    |event| match event {
        DrainEvent::Stall { .. } if !saw_result => {
            sender.as_ref().unwrap().send(result).unwrap();
        }
        DrainEvent::Stall { .. } => { sender.take(); } // end drain
        DrainEvent::Result { .. } => { saw_result = true; }
        _ => {}
    },
);

// ❌ BAD: producer sleeps to land a result "partway through" a window
// whose size is itself a wall-clock deadline. Every extension of the
// deadline just makes the race wider, not correct.
std::thread::spawn(move || {
    std::thread::sleep(Duration::from_millis(80));
    tx.send(result).unwrap();
    std::thread::sleep(Duration::from_millis(2000));
    drop(tx);
});
let outcome = drain_results_with_timings(
    rx, /* ... */,
    Instant::now() + Duration::from_millis(1000),
    /* ... */,
);
```

**Rule of thumb:** if your producer thread needs `thread::sleep` to line up with a deadline in the code under test, you're racing the scheduler. Reach for the callback, a channel, or a condvar instead. Fixed deadlines belong only in the safety-net role — "stop if something has truly hung" — not in the assertion path.

**Exception - testing absence:** When verifying something did NOT happen, polling doesn't work. Use a fixed 500ms+ sleep:

```rust
thread::sleep(Duration::from_millis(500));
assert!(!marker_file.exists(), "Command should NOT have run");
```

## Testing with --execute Commands

Use `--yes` to skip interactive prompts in tests. Don't pipe input to stdin.

## Feature Flags, Not Runtime Skipping

**Never skip tests based on runtime availability checks.** Use Cargo feature flags instead.

```rust
// ❌ BAD: Runtime skip - test silently passes when tool unavailable
#[test]
fn test_fish_integration() {
    if !shell_available("fish") {
        eprintln!("Skipping: fish not available");
        return;
    }
    // test code...
}

// ✅ GOOD: Feature flag - test excluded from compilation
#[cfg(feature = "shell-integration-tests")]
#[test]
fn test_fish_integration() {
    // test code...
}
```

**Why:**
- Runtime skips hide missing test coverage in CI logs
- Feature flags make dependencies explicit in `Cargo.toml`
- `cargo test` output clearly shows which tests ran vs were compiled out
- CI can enable features when dependencies are installed

**Existing feature flags:**
- `shell-integration-tests` — Tests requiring bash/zsh/fish shells and PTY

## README Examples and Snapshot Testing

### Problem: Separated stdout/stderr in Standard Snapshots

README examples need to show output as users see it in their terminal - with stdout and stderr interleaved in the order they appear. However, the standard `insta_cmd` snapshot testing (used in most integration tests) separates stdout and stderr into different sections:

```yaml
----- stdout -----
🔄 Running pre-merge test:
  uv run pytest

----- stderr -----
============================= test session starts ==============================
collected 18 items
...
```

This makes snapshots **not directly copyable** into README.md because:
1. The output is split into two sections
2. We lose the temporal ordering (which output appeared first)
3. Users never see this separation - their terminal shows combined output

### Solution: Use PTY-based Testing for README Examples

For tests that generate README examples, use the PTY-based execution pattern from `tests/integration_tests/shell_wrapper.rs`:

**Key functions** in `tests/common/pty.rs`:
- `build_pty_command()` — builds a `CommandBuilder` with standard PTY isolation
- `exec_cmd_in_pty()` — executes in a PTY, writing all input immediately (non-interactive)
- `exec_cmd_in_pty_prompted()` — executes in a PTY, waiting for prompts before sending input

These use `portable_pty` to execute commands in a pseudo-terminal, returning
combined stdout+stderr as a single `String` with ANSI color codes and proper
temporal ordering.

**Pattern to use:**

```rust
use crate::common::pty::{build_pty_command, exec_cmd_in_pty};

let cmd = build_pty_command("wt", &["merge"], &repo_path, &env_vars, None);
let (combined_output, exit_code) = exec_cmd_in_pty(cmd, "");
assert_snapshot!("readme_example_name", combined_output);
```

**Benefits:**
- Output is directly copyable to README.md
- Shows actual user experience (interleaved stdout/stderr)
- Preserves temporal ordering of output
- No manual merging of stdout/stderr needed

**Example:** See `tests/integration_tests/shell_wrapper.rs`:
- `ShellOutput` struct with `combined: String`
- `exec_in_pty_interactive()` — shell-wrapper-specific PTY helper

### When to Use Each Approach

**Use `insta_cmd` (standard snapshots):**
- Unit and integration tests focused on correctness
- Tests that need to verify stdout/stderr separately
- Tests checking exit codes and specific error messages
- Most tests in the codebase

**Use PTY-based execution (PTY-based snapshots):**
- Tests generating output for README.md examples
- Tests verifying shell integration (`wt` function, directives)
- Tests needing to verify complete user experience
- Any test where temporal ordering of stdout/stderr matters

### Current Status

**README examples using PTY-based approach:**
- Shell wrapper tests (all of `tests/integration_tests/shell_wrapper.rs`)

**README examples using standard snapshots (working, but require manual editing):**
- `test_readme_example_simple()` - Quick start merge example
- `test_readme_example_complex()` - LLM commit example
- `test_readme_example_hooks_pre_start()` - Pre-start hooks
- `test_readme_example_hooks_pre_merge()` - Pre-merge hooks

**Current workflow:** These tests work correctly and generate accurate snapshots. However, the snapshots separate stdout and stderr into different sections, which means they cannot be directly copied into README.md. Instead, the README examples are manually edited versions that merge stdout/stderr in the correct temporal order and remove ANSI codes.

**Future improvement:** Migrate README example tests to use PTY execution so snapshots are directly copyable into README.md without manual editing. This is an enhancement for developer convenience, not a bug fix.

### Migration Checklist

When converting a README example test from `insta_cmd` to PTY-based:

1. ✅ Import `portable_pty` dependencies
2. ✅ Use `build_pty_command()` + `exec_cmd_in_pty()` from `tests/common/pty.rs`
3. ✅ Replace `make_snapshot_cmd()` + `assert_cmd_snapshot!()` with PTY execution + `assert_snapshot!()`
4. ✅ Ensure environment variables include `CLICOLOR_FORCE=1` for ANSI codes
5. ✅ Update snapshot file format (file snapshot, not inline)
6. ✅ Verify output matches expected README format
7. ✅ Update README.md to reference new snapshot location

### Implementation Note

The PTY approach is specifically for **user-facing output documentation**. It's not a replacement for standard integration tests - both approaches serve different purposes and should coexist in the test suite.

## Coverage in PTY Tests

PTY tests use `cmd.env_clear()` for isolation. To enable coverage, pass through LLVM env vars:

```rust
// Standard setup (most PTY tests)
crate::common::configure_pty_command(&mut cmd);

// Custom env setup (shell tests needing USER, SHELL, ZDOTDIR)
cmd.env_clear();
cmd.env("HOME", ...);
// ... custom env ...
crate::common::pass_coverage_env_to_pty_cmd(&mut cmd);
```

## No Global State Mutations in Tests

**Never mutate process-global state in tests.** Rust's test runner executes tests in parallel within the same process, so global mutations leak across tests and cause non-deterministic behavior.

Forbidden patterns:
- `log::set_max_level()` — affects all concurrent and subsequent tests
- `std::env::set_var()` — process-wide, races with other tests
- Setting global `static` variables without synchronization

If coverage tools flag uncovered `log::debug!()` format args, accept the gap — it's not meaningful coverage and not worth global side effects.

```rust
// ❌ BAD: Global mutation leaks across parallel tests
#[test]
fn test_something() {
    log::set_max_level(log::LevelFilter::Debug);
    // ...
}

// ❌ BAD: Environment variable race condition
#[test]
fn test_config_loading() {
    std::env::set_var("MY_CONFIG", "test_value");
    // ...
}
```

For environment-dependent tests, use `Command::new()` with `.env()` to set variables in a subprocess, or use the test isolation helpers (`repo.wt_command()`, `wt_command()`).

## Test Style

### Snapshot env drift is cosmetic

`insta_cmd` snapshots record the test's environment variables in an `env:` block.
When test infrastructure changes add or reorder env vars (e.g., `NO_COLOR: ""`
appearing in a snapshot that didn't have it before), the snapshot diff includes
those lines even though the test output is unchanged. This is cosmetic drift —
accept it without comment during review.

### Inline snapshots over multi-assert

When a test checks formatted output, use `insta::assert_snapshot!` with an
inline snapshot instead of multiple `assert!(x.contains(...))` calls. Snapshots
capture the complete output, so a single snapshot replaces many contains checks
and catches regressions that spot-checks miss.

```rust
use insta::assert_snapshot;

// ✅ GOOD: One snapshot captures all formatting
assert_snapshot!(format_message("hello"), @"  │ hello");

// ❌ BAD: Spot-checks that miss structural regressions
assert!(result.contains("│"));
assert!(result.contains("hello"));
assert!(!result.contains("error"));
```

Import `assert_snapshot` directly (`use insta::assert_snapshot;`) rather than
using the qualified `insta::assert_snapshot!` form.

For first-time snapshot creation, leave the inline value empty (`@""`), then
run `cargo insta test --accept` to fill it.

To update existing file-based snapshots (e.g., after editing CLI help text),
use `cargo insta test --accept`:

```bash
cargo insta test --accept -- --test integration "test_help"
```

Do not manually edit `.snap` files — they contain ANSI escape sequences that
are difficult to reproduce by hand.

### One test per belief

Group related inputs into a single test when they verify the same belief about
the code. A test named `test_wrap_text_at_width` that exercises short text, long
text, single words, and edge cases is better than five separate test functions
testing each input individually.

```rust
// ✅ GOOD: One test for the belief "wrapping respects word boundaries"
#[test]
fn test_wrap_text_at_width() {
    assert_eq!(wrap_text_at_width("short text", 20), vec!["short text"]);
    assert_eq!(wrap_text_at_width("hello world foo bar", 10), vec!["hello", "world foo", "bar"]);
    assert_eq!(wrap_text_at_width("superlongword", 5), vec!["superlongword"]);
    assert_eq!(wrap_text_at_width("", 20), vec![""]);
}
```

Table-driven tests work well for functions that map inputs to expected outputs:

```rust
#[test]
fn test_bash_token_styles() {
    let cases = [
        ("function", AnsiColor::Blue),
        ("keyword", AnsiColor::Magenta),
        ("string", AnsiColor::Green),
    ];
    for (name, expected) in cases {
        let style = bash_token_style(name).expect(name);
        assert_eq!(style.get_fg_color(), Some(Color::Ansi(expected)), "{name}");
    }
}
```

### Don't test constructors or dependencies

Tests that verify `Style::new().bold()` produces a bold style, or that
`StyledString::raw("x")` stores `"x"`, are testing the dependency — not our
code. Delete these. Test the behavior that uses these types instead.

## Deterministic Time in Tests

Tests use `TEST_EPOCH` (2025-01-01) for reproducible timestamps. The constant is defined in `tests/common/mod.rs` and automatically set as `WORKTRUNK_TEST_EPOCH` in the test environment.

**For test data with timestamps** (cache entries, etc.), use the constant:

```rust
use crate::common::TEST_EPOCH;

repo.run_git(&[
    "config", "worktrunk.state.feature.ci-status",
    &format!(r#"{{"checked_at":{TEST_EPOCH},"head":"abc123"}}"#),
]);
```

**For production code** that needs timestamps, use `worktrunk::utils::epoch_now()` which respects `WORKTRUNK_TEST_EPOCH`. Using `SystemTime::now()` directly causes flaky tests.