worktrunk 0.37.1

# Testing Guidelines

## Running `wt` Commands in Tests

**Use the correct helper to ensure test isolation.** Tests that spawn `wt` must
be isolated from the host environment to prevent:

- **Directive leakage**: Test commands writing to the user's shell directive file
- **Config pollution**: Tests reading/writing the user's real config
- **Git interference**: Host GIT_* environment variables affecting test behavior

### With a TestRepo fixture (most tests)

Use `repo.wt_command()` which returns a pre-configured Command:

```rust
// ✅ GOOD: Simple case
let output = repo.wt_command()
    .args(["switch", "--create", "feature"])
    .output()?;

// ✅ GOOD: With additional configuration (piped stdin, etc.)
let mut cmd = repo.wt_command();
cmd.args(["switch", "--create", "feature"])
    .stdin(Stdio::piped());
```

```rust
// ❌ BAD: Missing isolation - inherits host environment
let output = Command::new(env!("CARGO_BIN_EXE_wt"))
    .args(["switch", "--create", "feature"])
    .current_dir(repo.root_path())
    .output()?;
```

### Without a TestRepo (e.g., readme_sync tests)

Use the free function `wt_command()`:

```rust
use crate::common::wt_command;

// ✅ GOOD: Isolated from host environment
let output = wt_command()
    .args(["--help"])
    .current_dir(project_root)
    .output()?;
```

### Method reference

| Method | Returns | Use when |
|--------|---------|----------|
| `repo.wt_command()` | `Command` | Running wt commands with a TestRepo |
| `wt_command()` | `Command` | Running wt without a TestRepo (free function) |
| `repo.git_command()` | `Cmd` | Running git commands (use `.run()` not `.output()`) |

## Timing Tests: Long Timeouts with Fast Polling

**Core principle:** Use long timeouts (5+ seconds) for reliability on slow CI, but poll frequently (10-50ms) so tests complete quickly when things work.

This achieves both goals:
- **No flaky failures** on slow machines - generous timeout accommodates worst-case
- **Fast tests** on normal machines - frequent polling means no unnecessary waiting

```rust
// ✅ GOOD: Long timeout, fast polling
let timeout = Duration::from_secs(5);
let poll_interval = Duration::from_millis(10);
let start = Instant::now();
while start.elapsed() < timeout {
    if condition_met() { break; }
    thread::sleep(poll_interval);
}

// ❌ BAD: Fixed sleep (always slow, might still fail)
thread::sleep(Duration::from_millis(500));
assert!(condition_met());

// ❌ BAD: Short timeout (flaky on slow CI)
let timeout = Duration::from_millis(100);
```

Use the helpers in `tests/common/mod.rs`:

```rust
use crate::common::{wait_for_file, wait_for_file_count, wait_for_file_content};

// ✅ Poll for file existence (30-second default timeout)
wait_for_file(&log_file);

// ✅ Poll for multiple files
wait_for_file_count(&log_dir, "log", 3);

// ✅ Poll for file with non-empty content
wait_for_file_content(&marker_file);
```

These use exponential backoff (10ms → 500ms cap) for fast initial checks that back off on slow CI. The 30-second default timeout is generous enough to avoid flakiness under CI load.

### Event-driven code: drive the scenario from the callback

When the system under test exposes a callback, channel, or event hook, drive the scenario **causally** through that hook instead of racing wall-clock timers. The callback gives you a happens-before edge into the loop — use it to inject inputs and terminate the run, so the test's timing depends on the event ordering, not on CPU scheduling.

```rust
// ✅ GOOD: causally driven — first Stall event injects a result; a Stall
// observed after the result drops tx to end the drain via Disconnected.
// Runs at threshold speed on any hardware; the 5s deadline is only a
// safety net.
let mut sender = Some(tx);
let mut saw_result = false;
let outcome = drain_results_with_timings(
    rx, /* ... */,
    Instant::now() + Duration::from_secs(5),
    StallTimings { threshold: ms(20), tick: ms(10) },
    |event| match event {
        DrainEvent::Stall { .. } if !saw_result => {
            sender.as_ref().unwrap().send(result).unwrap();
        }
        DrainEvent::Stall { .. } => { sender.take(); } // end drain
        DrainEvent::Result { .. } => { saw_result = true; }
        _ => {}
    },
);

// ❌ BAD: producer sleeps to land a result "partway through" a window
// whose size is itself a wall-clock deadline. Every extension of the
// deadline just makes the race wider, not correct.
std::thread::spawn(move || {
    std::thread::sleep(Duration::from_millis(80));
    tx.send(result).unwrap();
    std::thread::sleep(Duration::from_millis(2000));
    drop(tx);
});
let outcome = drain_results_with_timings(
    rx, /* ... */,
    Instant::now() + Duration::from_millis(1000),
    /* ... */,
);
```

**Rule of thumb:** if your producer thread needs `thread::sleep` to line up with a deadline in the code under test, you're racing the scheduler. Reach for the callback, a channel, or a condvar instead. Fixed deadlines belong only in the safety-net role — "stop if something has truly hung" — not in the assertion path.

**Exception - testing absence:** When verifying something did NOT happen, polling doesn't work. Use a fixed 500ms+ sleep:

```rust
thread::sleep(Duration::from_millis(500));
assert!(!marker_file.exists(), "Command should NOT have run");
```

## Testing with --execute Commands

Use `--yes` to skip interactive prompts in tests. Don't pipe input to stdin.

## Feature Flags, Not Runtime Skipping

**Never skip tests based on runtime availability checks.** Use Cargo feature flags instead.

```rust
// ❌ BAD: Runtime skip - test silently passes when tool unavailable
#[test]
fn test_fish_integration() {
    if !shell_available("fish") {
        eprintln!("Skipping: fish not available");
        return;
    }
    // test code...
}

// ✅ GOOD: Feature flag - test excluded from compilation
#[cfg(feature = "shell-integration-tests")]
#[test]
fn test_fish_integration() {
    // test code...
}
```

**Why:**
- Runtime skips hide missing test coverage in CI logs
- Feature flags make dependencies explicit in `Cargo.toml`
- `cargo test` output clearly shows which tests ran vs were compiled out
- CI can enable features when dependencies are installed

**Existing feature flags:**
- `shell-integration-tests` — Tests requiring bash/zsh/fish shells and PTY

## README Examples and Snapshot Testing

### Problem: Separated stdout/stderr in Standard Snapshots

README examples need to show output as users see it in their terminal - with stdout and stderr interleaved in the order they appear. However, the standard `insta_cmd` snapshot testing (used in most integration tests) separates stdout and stderr into different sections:

```yaml
----- stdout -----
🔄 Running pre-merge test:
  uv run pytest

----- stderr -----
============================= test session starts ==============================
collected 18 items
...
```

This makes snapshots **not directly copyable** into README.md because:
1. The output is split into two sections
2. We lose the temporal ordering (which output appeared first)
3. Users never see this separation - their terminal shows combined output

### Solution: Use PTY-based Testing for README Examples

For tests that generate README examples, use the PTY-based execution pattern from `tests/integration_tests/shell_wrapper.rs`:

**Key functions** in `tests/common/pty.rs`:
- `build_pty_command()` — builds a `CommandBuilder` with standard PTY isolation
- `exec_cmd_in_pty()` — executes in a PTY, writing all input immediately (non-interactive)
- `exec_cmd_in_pty_prompted()` — executes in a PTY, waiting for prompts before sending input

These use `portable_pty` to execute commands in a pseudo-terminal, returning
combined stdout+stderr as a single `String` with ANSI color codes and proper
temporal ordering.

**Pattern to use:**

```rust
use crate::common::pty::{build_pty_command, exec_cmd_in_pty};

let cmd = build_pty_command("wt", &["merge"], &repo_path, &env_vars, None);
let (combined_output, exit_code) = exec_cmd_in_pty(cmd, "");
assert_snapshot!("readme_example_name", combined_output);
```

**Benefits:**
- Output is directly copyable to README.md
- Shows actual user experience (interleaved stdout/stderr)
- Preserves temporal ordering of output
- No manual merging of stdout/stderr needed

**Example:** See `tests/integration_tests/shell_wrapper.rs`:
- `ShellOutput` struct with `combined: String`
- `exec_in_pty_interactive()` — shell-wrapper-specific PTY helper

### When to Use Each Approach

**Use `insta_cmd` (standard snapshots):**
- Unit and integration tests focused on correctness
- Tests that need to verify stdout/stderr separately
- Tests checking exit codes and specific error messages
- Most tests in the codebase

**Use PTY-based execution (PTY-based snapshots):**
- Tests generating output for README.md examples
- Tests verifying shell integration (`wt` function, directives)
- Tests needing to verify complete user experience
- Any test where temporal ordering of stdout/stderr matters

### Current Status

**README examples using PTY-based approach:**
- Shell wrapper tests (all of `tests/integration_tests/shell_wrapper.rs`)

**README examples using standard snapshots (working, but require manual editing):**
- `test_readme_example_simple()` - Quick start merge example
- `test_readme_example_complex()` - LLM commit example
- `test_readme_example_hooks_pre_start()` - Pre-start hooks
- `test_readme_example_hooks_pre_merge()` - Pre-merge hooks

**Current workflow:** These tests work correctly and generate accurate snapshots. However, the snapshots separate stdout and stderr into different sections, which means they cannot be directly copied into README.md. Instead, the README examples are manually edited versions that merge stdout/stderr in the correct temporal order and remove ANSI codes.

**Future improvement:** Migrate README example tests to use PTY execution so snapshots are directly copyable into README.md without manual editing. This is an enhancement for developer convenience, not a bug fix.

### Migration Checklist

When converting a README example test from `insta_cmd` to PTY-based:

1. ✅ Import `portable_pty` dependencies
2. ✅ Use `build_pty_command()` + `exec_cmd_in_pty()` from `tests/common/pty.rs`
3. ✅ Replace `make_snapshot_cmd()` + `assert_cmd_snapshot!()` with PTY execution + `assert_snapshot!()`
4. ✅ Ensure environment variables include `CLICOLOR_FORCE=1` for ANSI codes
5. ✅ Update snapshot file format (file snapshot, not inline)
6. ✅ Verify output matches expected README format
7. ✅ Update README.md to reference new snapshot location

### Implementation Note

The PTY approach is specifically for **user-facing output documentation**. It's not a replacement for standard integration tests - both approaches serve different purposes and should coexist in the test suite.

## Coverage in PTY Tests

PTY tests use `cmd.env_clear()` for isolation. To enable coverage, pass through LLVM env vars:

```rust
// Standard setup (most PTY tests)
crate::common::configure_pty_command(&mut cmd);

// Custom env setup (shell tests needing USER, SHELL, ZDOTDIR)
cmd.env_clear();
cmd.env("HOME", ...);
// ... custom env ...
crate::common::pass_coverage_env_to_pty_cmd(&mut cmd);
```

## No Global State Mutations in Tests

**Never mutate process-global state in tests.** Rust's test runner executes tests in parallel within the same process, so global mutations leak across tests and cause non-deterministic behavior.

Forbidden patterns:
- `log::set_max_level()` — affects all concurrent and subsequent tests
- `std::env::set_var()` — process-wide, races with other tests
- Setting global `static` variables without synchronization

If coverage tools flag uncovered `log::debug!()` format args, accept the gap — it's not meaningful coverage and not worth global side effects.

```rust
// ❌ BAD: Global mutation leaks across parallel tests
#[test]
fn test_something() {
    log::set_max_level(log::LevelFilter::Debug);
    // ...
}

// ❌ BAD: Environment variable race condition
#[test]
fn test_config_loading() {
    std::env::set_var("MY_CONFIG", "test_value");
    // ...
}
```

For environment-dependent tests, use `Command::new()` with `.env()` to set variables in a subprocess, or use the test isolation helpers (`repo.wt_command()`, `wt_command()`).

## Test Style

### Snapshot env drift is cosmetic

`insta_cmd` snapshots record the test's environment variables in an `env:` block.
When test infrastructure changes add or reorder env vars (e.g., `NO_COLOR: ""`
appearing in a snapshot that didn't have it before), the snapshot diff includes
those lines even though the test output is unchanged. This is cosmetic drift —
accept it without comment during review.

### Inline snapshots over multi-assert

When a test checks formatted output, use `insta::assert_snapshot!` with an
inline snapshot instead of multiple `assert!(x.contains(...))` calls. Snapshots
capture the complete output, so a single snapshot replaces many contains checks
and catches regressions that spot-checks miss.

```rust
use insta::assert_snapshot;

// ✅ GOOD: One snapshot captures all formatting
assert_snapshot!(format_message("hello"), @"  │ hello");

// ❌ BAD: Spot-checks that miss structural regressions
assert!(result.contains("│"));
assert!(result.contains("hello"));
assert!(!result.contains("error"));
```

Import `assert_snapshot` directly (`use insta::assert_snapshot;`) rather than
using the qualified `insta::assert_snapshot!` form.

For first-time snapshot creation, leave the inline value empty (`@""`), then
run `cargo insta test --accept` to fill it.

To update existing file-based snapshots (e.g., after editing CLI help text),
use `cargo insta test --accept`:

```bash
cargo insta test --accept -- --test integration "test_help"
```

Do not manually edit `.snap` files — they contain ANSI escape sequences that
are difficult to reproduce by hand.

### One test per belief

Group related inputs into a single test when they verify the same belief about
the code. A test named `test_wrap_text_at_width` that exercises short text, long
text, single words, and edge cases is better than five separate test functions
testing each input individually.

```rust
// ✅ GOOD: One test for the belief "wrapping respects word boundaries"
#[test]
fn test_wrap_text_at_width() {
    assert_eq!(wrap_text_at_width("short text", 20), vec!["short text"]);
    assert_eq!(wrap_text_at_width("hello world foo bar", 10), vec!["hello", "world foo", "bar"]);
    assert_eq!(wrap_text_at_width("superlongword", 5), vec!["superlongword"]);
    assert_eq!(wrap_text_at_width("", 20), vec![""]);
}
```

Table-driven tests work well for functions that map inputs to expected outputs:

```rust
#[test]
fn test_bash_token_styles() {
    let cases = [
        ("function", AnsiColor::Blue),
        ("keyword", AnsiColor::Magenta),
        ("string", AnsiColor::Green),
    ];
    for (name, expected) in cases {
        let style = bash_token_style(name).expect(name);
        assert_eq!(style.get_fg_color(), Some(Color::Ansi(expected)), "{name}");
    }
}
```

### Don't test constructors or dependencies

Tests that verify `Style::new().bold()` produces a bold style, or that
`StyledString::raw("x")` stores `"x"`, are testing the dependency — not our
code. Delete these. Test the behavior that uses these types instead.

## Deterministic Time in Tests

Tests use `TEST_EPOCH` (2025-01-01) for reproducible timestamps. The constant is defined in `tests/common/mod.rs` and automatically set as `WORKTRUNK_TEST_EPOCH` in the test environment.

**For test data with timestamps** (cache entries, etc.), use the constant:

```rust
use crate::common::TEST_EPOCH;

repo.run_git(&[
    "config", "worktrunk.state.feature.ci-status",
    &format!(r#"{{"checked_at":{TEST_EPOCH},"head":"abc123"}}"#),
]);
```

**For production code** that needs timestamps, use `worktrunk::utils::epoch_now()` which respects `WORKTRUNK_TEST_EPOCH`. Using `SystemTime::now()` directly causes flaky tests.