context-builder 0.7.1

# Directory Structure Report

This document contains all files from the `context-builder` directory, optimized for LLM consumption.
Content hash: 40c4383e078079b5

## File Tree Structure

- 📄 AGENTS.md
- 📄 BENCHMARKS.md
- 📄 CHANGELOG.md
- 📄 Cargo.toml
- 📄 DEVELOPMENT.md
- 📄 LICENSE
- 📄 README.md
- 📁 benches
  - 📄 context_bench.rs
- 📁 docs
  - 📁 research
    - 📄 multi-model-code-review-analysis.md
- 📁 scripts
  - 📄 generate_samples.rs
- 📁 src
  - 📄 cache.rs
  - 📄 cli.rs
  - 📄 config.rs
  - 📄 config_resolver.rs
  - 📄 diff.rs
  - 📄 file_utils.rs
  - 📄 lib.rs
  - 📄 main.rs
  - 📄 markdown.rs
  - 📄 state.rs
  - 📄 token_count.rs
  - 📄 tree.rs
- 📄 tarpaulin.toml
- 📁 tests
  - 📄 cli_integration.rs
  - 📄 diff_integration.rs
  - 📄 test_auto_diff.rs
  - 📄 test_binary_file_autodiff.rs
  - 📄 test_comprehensive_edge_cases.rs
  - 📄 test_config_resolution.rs
  - 📄 test_cwd_independence.rs
  - 📄 test_determinism.rs
  - 📄 test_parallel_memory.rs
  - 📄 test_phase4_integration.rs


### File: `AGENTS.md`

- Size: 6816 bytes
- Modified: 2026-02-14 07:24:34 UTC

```markdown
# AGENTS.md - AI Agent Instructions

This file helps AI agents quickly understand and contribute to the Context Builder codebase.

## Project Overview

Context Builder is a **blazing-fast Rust CLI** for aggregating entire codebases into single, LLM-friendly markdown files. Published on [crates.io](https://crates.io/crates/context-builder) under MIT license.

**If this is your first time:** Read this file, then run `cargo run -- --help` to see all options.

---

## Tech Stack

| Technology | Usage |
|---|---|
| **Language** | Rust (Edition 2024) |
| **Build** | Cargo (no npm/bun/node) |
| **CLI** | `clap` (derive) |
| **Parallelism** | `rayon` (optional, default on) + `crossbeam-channel` |
| **Diffing** | `similar` (unified diffs) |
| **File traversal** | `ignore` crate (gitignore-aware) |
| **Token counting** | `tiktoken-rs` (`cl100k_base`) |
| **Caching** | JSON + `fs2` file locking |
| **Config** | TOML (`context-builder.toml`) |
| **Encoding** | `encoding_rs` (transcoding non-UTF-8) |
| **Logging** | `env_logger` |
| **Branch** | `master` (not `main`) |

---

## Project Structure

```
context-builder/
├── src/
│   ├── main.rs              # Entry point — calls lib::run()
│   ├── lib.rs               # Core orchestration, run_with_args(), Prompter trait, --init
│   ├── cli.rs               # Args struct via clap derive
│   ├── config.rs            # Config struct, TOML deserialization
│   ├── config_resolver.rs   # Merges CLI args + TOML config (CLI > config > defaults)
│   ├── file_utils.rs        # .gitignore-aware traversal, OverrideBuilder for custom ignores
│   ├── tree.rs              # BTreeMap file tree (deterministic ordering)
│   ├── state.rs             # ProjectState/FileState structured snapshots
│   ├── markdown.rs          # Streaming file renderer, binary detection, encoding, parallel
│   ├── cache.rs             # JSON-based caching with fs2 locking, old cache migration
│   ├── diff.rs              # Per-file unified diffs via similar
│   └── token_count.rs       # Real tokenization via tiktoken-rs (cl100k_base, lazy init)
├── tests/                   # 10 integration test files
├── benches/                 # Criterion benchmark suite
├── scripts/                 # generate_samples.rs (benchmark dataset generator)
├── context-builder.toml     # Project's own config file
├── Cargo.toml               # Crate metadata, dependencies, features
├── DEVELOPMENT.md           # Contributor guide
├── BENCHMARKS.md            # Performance benchmarking guide
├── CHANGELOG.md             # Release history
└── .github/workflows/ci.yml # CI: fmt, clippy, build, test, security audit (ubuntu/win/macos)
```

---

## Key Commands

```bash
# Build
cargo build

# Run
cargo run -- --help
cargo run -- -d . -o out.md -f rs -f toml
cargo run -- --preview        # File tree only, no output
cargo run -- --init           # Create config file with auto-detected filters

# Test (MUST use single thread — tests share CWD)
cargo test -- --test-threads=1

# Lint (must pass -D warnings)
cargo clippy --all-targets --all-features -- -D warnings

# Format
cargo fmt --all
```

---

## Key Design Patterns

1. **`Prompter` trait** — Abstracts user confirmation (overwrite/processing). Tests use `MockPrompter`/`TestPrompter`. Never add stdin reads in library code.

2. **Streaming writes** — `markdown.rs` processes files line-by-line for low memory. With `parallel` feature, uses crossbeam channels for concurrent processing.

3. **Structured state** — v0.5.0 replaced fragile text-based cache parsing with JSON `ProjectState` snapshots for reliable auto-diff.

4. **Deterministic output** — `BTreeMap` everywhere ensures identical output across runs.

5. **Config precedence** — CLI args > TOML config > defaults, with explicit detection in `config_resolver.rs`.

---

## Feature Flags

| Feature | Default | Purpose |
|---|---|---|
| `parallel` | ✅ | Rayon for parallel file processing |
| `samples-bin` | ❌ | Exposes `generate_samples` binary for benchmarking |

---

## Environment Variables

| Variable | Purpose |
|---|---|
| `CB_SILENT` | `"1"` suppresses user-facing prints (benchmarks set this) |
| `CB_BENCH_MEDIUM` | `"1"` enables heavier benchmark datasets |
| `CB_BENCH_DATASET_DIR` | External benchmark dataset root |
| `RUST_LOG` | Controls `env_logger` verbosity (e.g., `RUST_LOG=info`) |

---

## Code Style Guidelines

1. **Error handling** — Use `io::Result`. Prefer returning errors over panicking. `unwrap()`/`expect()` OK in tests, NOT in library code.
2. **Cross-platform** — Normalize path separators in tests for string comparisons.
3. **New CLI flags** — Add in `cli.rs`, update tests in same file, propagate through `run_with_args`.
4. **Language detection** — Keep simple and deterministic; add mappings in one place.
5. **Binary detection** — Lightweight: NUL byte check + UTF-8 validity.
6. **Logging** — Use `log::{info, warn, error}`. Let `env_logger` control emission.

---

## Test Organization

- **Unit tests**: Inline `#[cfg(test)]` modules in every source file
- **Integration tests** (10 files in `tests/`):
  - `test_auto_diff.rs` — Auto-diff workflow (largest test file)
  - `test_determinism.rs` — Output determinism verification
  - `test_config_resolution.rs` — CLI/config merge behavior
  - `test_cwd_independence.rs` — Path independence
  - `test_comprehensive_edge_cases.rs` — Edge cases
  - `cli_integration.rs` — End-to-end CLI tests
  - `test_binary_file_autodiff.rs`, `test_parallel_memory.rs`, `test_phase4_integration.rs`, `diff_integration.rs`
- **Benchmarks**: Criterion suite at `benches/context_bench.rs`

**Critical:** Tests MUST run with `--test-threads=1` (CI enforces this). Many tests use `set_current_dir()` which is process-global. Use `#[serial]` attribute where order matters.

---

## Known Hazards

- **Year in tests**: Watch for hardcoded year strings in timestamp assertions. Use dynamic `Utc::now().format("%Y")` instead.
- **CWD mutation**: Tests that `set_current_dir()` must restore the original directory in all code paths (including panics).
- **Config from CWD**: `load_config()` reads from CWD. `load_config_from_path()` reads from explicit root. Prefer the latter in tests.
- **Cache collisions**: Cache keys are project-path + config hash. Different configs = different cache files.

---

## Release Process

1. `cargo fmt --all && cargo clippy --all-targets --all-features -- -D warnings && cargo test -- --test-threads=1`
2. Bump `version` in `Cargo.toml`, add entry to `CHANGELOG.md`
3. `git commit -am "chore(release): vX.Y.Z" && git tag vX.Y.Z && git push && git push --tags`
4. `cargo publish`
```

### File: `CHANGELOG.md`

- Size: 9078 bytes
- Modified: 2026-02-15 04:21:58 UTC

```markdown
# Changelog

All notable changes to this project will be documented in this file.

## v0.7.1

- **Bug Fixes** (identified by Gemini Deep Think multi-round code review)
  - Fixed content hash using absolute OS paths — now normalized to relative unix-style for cross-platform determinism
  - Fixed hash collision risk — added null byte delimiter between path and content in content hash
  - Fixed `strip_prefix('+')` leaving extra space in diff_only mode, corrupting indentation
  - Fixed auto_diff path bypassing `--max-tokens` budget entirely
  - Fixed `src/tests/` files misclassified as source code instead of tests
  - Fixed `sorted_paths` missing cwd fallback, silently dropping files when cwd ≠ base_path

- **Auto-Ignore Common Directories**
  - 19 heavy directories (node_modules, dist, build, __pycache__, .venv, vendor, etc.) are now excluded by default
  - Prevents million-line outputs when processing projects without a `.git` directory

- **Context Window Warnings**
  - Shows estimated token count after every run
  - Warns when output exceeds 128K tokens with actionable CLI suggestions

## v0.7.0

- **Deterministic Output**
  - Replaced volatile timestamp (`Processed at: <timestamp>`) with a content hash (`Content hash: <hex>`) in the Markdown header
  - Identical project states now produce byte-for-byte identical output files, enabling LLM prompt caching

- **Context Budgeting (`--max-tokens N`)**
  - New CLI argument `--max-tokens` and `context-builder.toml` config option to cap the output token budget
  - Files are processed until the budget is exhausted, with a `<truncated>` marker appended
  - Prevents API errors from excessively large contexts and reduces costs

- **Relevance-Based File Ordering**
  - Files are now sorted by relevance category: config files (0) → source code (1) → tests (2) → docs/other (3)
  - Within each category, files remain alphabetically sorted
  - Helps LLMs prioritize core logic and configuration over supporting files

## v0.6.1

- **Bug Fixes** (identified by Gemini Deep Think code review)
  - Fixed TOCTOU race in cache writes: `File::create` was truncating before acquiring lock, risking data loss for concurrent readers
  - Fixed indentation destruction in `diff_only` mode: `trim_start()` was stripping all leading whitespace from added files, corrupting Python/YAML
  - Fixed UTF-8 boundary corruption: 8KB sniff buffer could split multi-byte characters, misclassifying valid UTF-8 files as binary
  - Fixed CLI flags silently overwritten: config file values were unconditionally overriding CLI arguments post-resolution
  - Removed duplicate file seek block (copy-paste error)

## v0.6.0

- **Smart Defaults**
  - Auto-exclude output files: the tool now automatically excludes its own generated output file, output folder, and `.context-builder/` cache directory from context collection without requiring manual `--ignore` flags
  - Timestamped output glob patterns (e.g., `docs/context_*.md`) are auto-excluded when `timestamped_output` is enabled
  - Large-file detection: warns about files exceeding 100 KB with a sorted top-5 list and total context size summary
  - Improved project name detection: canonicalizes relative paths (like `.`) to resolve the actual directory name instead of showing "unknown"

- **Testing & Stability**
  - Added `#[serial]` annotations to integration tests that mutate CWD, fixing intermittent test failures in parallel execution
  - All 146 tests pass consistently with `--test-threads=1`

- **Dependencies**
  - Updated `criterion` to 0.8.2
  - Updated `tiktoken-rs` to 0.9.1
  - Updated `toml` to 1.0.1

## v0.5.2

- Enhanced `--init` command to detect major file types in the current directory and suggest appropriate filters instead of using generic defaults
- Fixed file type detection to respect .gitignore patterns and common ignore directories (target, node_modules, etc.)

## v0.5.1

- Added `--init` command to create a new `context-builder.toml` configuration file in the current directory with sensible defaults

## v0.5.0

- **BREAKING CHANGES**
  - Cache file locations changed to project-specific paths to prevent collisions

- **Critical Bug Fixes**
  - **Fixed inverted ignore logic**: Corrected critical bug where ignore patterns were being treated as include patterns, causing files/directories meant to be ignored to be explicitly included instead
  - **Fixed cache read panics**: Improved error handling for corrupted cache files to prevent application crashes
  - **Fixed potential panics in path manipulation**: Added safe handling for edge case filenames without extensions or stems

- **Major Improvements**
  - **Deterministic Output**: Files are now sorted consistently, ensuring identical output for the same input across multiple runs
  - **Robust Caching Architecture**: Complete rewrite of caching system with:
    - Project-specific cache keys based on absolute path hash to prevent collisions
    - JSON-based structured caching replacing fragile markdown parsing
    - File locking with `fs2` crate for thread-safe concurrent access
    - Configuration changes now properly invalidate cache
  - **Enhanced Auto-Diff System**:
    - Structured state representation before markdown generation
    - Eliminated fragile text parsing with `extract_file_contents` and `strip_line_number` functions
    - Cache structured data (JSON) instead of markdown for reliability
  - **Thread Safety**: Removed all `unsafe` blocks and explicit configuration passing replaces environment variables

- **Performance Optimizations**
  - **Custom Ignores**: Now uses `ignore::overrides::OverrideBuilder` with glob pattern support for better performance
  - **Parallel Processing**: Improved error handling to collect all errors and continue processing other files
  - **Directory Traversal**: Let `ignore` crate optimize directory traversal instead of custom logic

- **Bug Fixes**
  - Fixed non-deterministic output order that caused inconsistent LLM context generation
  - Removed incorrect triple-backtick filtering in diff logic that was corrupting file content
  - Fixed cache corruption issues in concurrent access scenarios
  - Improved error recovery for partial failures and corrupted cache
  - Fixed inconsistent file tree visualization between auto-diff and standard modes

- **Testing & Quality**
  - Added comprehensive integration test suite with tests covering:
    - Determinism verification
    - Auto-diff workflows
    - Cache collision prevention
    - Configuration change detection
    - Error recovery scenarios
  - Fixed test race conditions by running tests serially in CI (`--test-threads=1`)
  - Added `pretty_assertions` for better test output
  - Fixed all clippy warnings and enforced `-D warnings` in CI

- **Dependencies**
  - Added `fs2` for file locking
  - Added `serde_json` for structured cache format
  - Added `serial_test` for test serialization
  - Added `pretty_assertions` for enhanced test output
  - Added `encoding_rs` for enhanced encoding detection and transcoding

- **Migration**
  - Automatic detection and cleanup of old markdown-based cache files (`last_canonical.md`, etc.)
  - First run after upgrade will clear old cache format to prevent conflicts
  - CLI interface remains fully backward compatible

- **Code Quality & Maintenance**
  - Fixed all clippy warnings including type complexity, collapsible if statements, and redundant closures
  - Updated CI workflow to prevent race conditions in tests
  - Improved binary file detection with better encoding strategy handling
  - Enhanced error handling for edge cases and file system operations

## v0.4.0


- Added

  - Token count mode (`--token-count`) now provides accurate token counts using the `tiktoken-rs` library.

  - Configuration file support (`context-builder.toml`) for project-specific settings.

  - Timestamped output versions.

  - `auto_diff` feature to automatically generate a diff from the latest output.
  - `diff_only` mode (`--diff-only` / `diff_only = true`) to output only the change summary and modified file diffs (no full file bodies) for lower token usage.

- Removed
  - Deprecated, unpublished `standalone_snapshot` option (replaced by `diff_only`).


## v0.3.0

- Changed
  - Parallel processing is now enabled by default via the `parallel` feature (uses `rayon`) for significant speedups on large projects.
    - To build/run sequentially, disable default features:
      - CLI/build: `cargo build --no-default-features` or `cargo run --no-default-features`
      - As a dependency: `default-features = false`
  - Updated Rust edition to 2024.

- Benchmarks
  - Benchmarks run silent by default by setting `CB_SILENT=1` at startup to avoid skewing timings with console I/O.
    - Override with `CB_SILENT=0` if you want to see output during benches.

## v0.2.0

- Added line numbers support
- Improved file tree visualization
- Enhanced error handling
- Better CLI argument validation

## v0.1.0

- Initial release
- Basic directory processing
- File filtering and ignoring
- Markdown output generation
```

### File: `Cargo.toml`

- Size: 1464 bytes
- Modified: 2026-02-15 04:22:00 UTC

```toml
[package]
name = "context-builder"
version = "0.7.1"
default-run = "context-builder"
edition = "2024"
authors = ["Igor Lins e Silva"]
description = "CLI tool to aggregate directory contents into a single markdown file optimized for LLM consumption"
readme = "README.md"
homepage = "https://github.com/igorls/context-builder"
repository = "https://github.com/igorls/context-builder"
license = "MIT"
keywords = ["cli", "markdown", "documentation", "llm", "context"]
categories = ["command-line-utilities", "development-tools"]

[dependencies]
clap = { version = "4.5.58", features = ["derive"] }
chrono = { version = "0.4.43", features = ["serde"] }
ignore = "0.4.25"
log = "0.4.29"
env_logger = "0.11.9"
rayon = { version = "1.10", optional = true }
serde = { version = "1.0.228", features = ["derive"] }
toml = "1.0.1"
similar = "2.7.0"
tempfile = "3.25.0"
tiktoken-rs = "0.9.1"
once_cell = "1.21.3"
fs2 = "0.4.3"
serde_json = "1.0.143"
crossbeam-channel = "0.5.15"
num_cpus = "1.17.0"
encoding_rs = "0.8.35"
walkdir = "2.5.0"
xxhash-rust = { version = "0.8", features = ["xxh3"] }

[features]
default = ["parallel"]
parallel = ["rayon"]
samples-bin = []

[dev-dependencies]
tempfile = "3.25.0"
criterion = { version = "0.8.2", features = ["html_reports"] }
pretty_assertions = "1.4.1"
serial_test = "3.0"

[[bench]]
name = "context_bench"
harness = false

[[bin]]
name = "generate_samples"
path = "scripts/generate_samples.rs"
required-features = ["samples-bin"]
```

### File: `README.md`

- Size: 9822 bytes
- Modified: 2026-02-14 07:14:48 UTC

```markdown
<div align="center">

# Context Builder

A blazing-fast CLI for creating LLM context from your entire codebase.

[![Crates.io](https://img.shields.io/crates/v/context-builder.svg)](https://crates.io/crates/context-builder)
![Crates.io Size](https://img.shields.io/crates/size/context-builder)
![Deps.rs Crate Dependencies (latest)](https://img.shields.io/deps-rs/context-builder/latest)
![Crates.io Total Downloads](https://img.shields.io/crates/d/context-builder)

</div>

<div align="center">

[![Coverage Status](https://coveralls.io/repos/github/igorls/context-builder/badge.svg?branch=master)](https://coveralls.io/github/igorls/context-builder?branch=master)
[![CI](https://github.com/igorls/context-builder/actions/workflows/ci.yml/badge.svg)](https://github.com/igorls/context-builder/actions/workflows/ci.yml)
![docs.rs](https://img.shields.io/docsrs/context-builder)

</div>

<div align="center">

[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/igorls/context-builder/blob/main/LICENSE)

</div>

<br/>

Tired of manually copy-pasting files into your LLM prompts? Context Builder automates this tedious process, creating a single, clean, and context-rich markdown file from any directory.

---

## Why Context Builder?

Providing broad context to Large Language Models (LLMs) is key to getting high-quality, relevant responses. This tool was built to solve one problem exceptionally well: **packaging your project's source code into a clean, LLM-friendly format with zero fuss.**

It's a command-line utility that recursively processes directories and creates comprehensive markdown documentation, optimized for AI conversations.

## Core Features


- ⚡ **Blazing Fast & Parallel by Default:**
  Processes thousands of files in seconds by leveraging all available CPU cores.

- 🧠 **Smart & Efficient File Discovery:**
  Respects `.gitignore` and custom ignore patterns out-of-the-box using optimized, parallel directory traversal.

- 💾 **Memory-Efficient Streaming:**
  Handles massive files with ease by reading and writing line-by-line, keeping memory usage low.

- 🌳 **Clear File Tree Visualization:**
  Generates an easy-to-read directory structure at the top of the output file.

- 🔍 **Powerful Filtering & Preview:**
  Easily include only the file extensions you need and use the instant `--preview` mode to see what will be processed.



 - ⚙️ **Configuration-First:**


  Use a `context-builder.toml` file to store your preferences for consistent, repeatable outputs. Initialize a new config file with `--init`, which will detect the major file types in your project (respecting `.gitignore` patterns) and suggest appropriate filters.




- 🔁 **Automatic Per-File Diffs:**
  When enabled, automatically generates a clean, noise-reduced diff showing what changed between snapshots.

- ✂️ **Diff-Only Mode:**
  Output only the change summary and modified file diffs—no full file bodies—to minimize token usage.

- 🧪 **Accurate Token Counting:**
  Get real tokenizer–based estimates with `--token-count` to plan your prompt budgets.


---

## Installation

### From crates.io (Recommended)

```bash
cargo install context-builder
```


### If you don't have Rust installed

Context Builder is distributed via crates.io. We do not ship pre-built binaries yet, so you need a Rust toolchain.


#### Quick install (Linux/macOS):

```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
Follow the prompt, then restart your shell

#### Windows: https://www.rust-lang.org/tools/install

After installation, ensure Cargo is on your PATH:

```bash
cargo --version
```

Then install Context Builder:
```bash
cargo install context-builder
```

Update later with:
```bash
cargo install context-builder --force
```

### From source

```bash
git clone https://github.com/igorls/context-builder.git
cd context-builder
cargo install --path .
```

---

## Usage

### Basic Usage



 # Initialize a new context-builder.toml config file with automatically detected file types (respecting .gitignore)

 context-builder --init



# Process current directory and create output.md
context-builder

# Process a specific directory
context-builder -d /path/to/project

# Specify an output file
context-builder -d /path/to/project -o documentation.md
```

### Advanced Options

```bash
# Filter by file extensions (e.g., only Rust and TOML files)
context-builder -f rs -f toml

# Ignore specific folders/files by name
context-builder -i target -i node_modules -i .git

# Preview mode (shows the file tree without generating output)
context-builder --preview

# Token count mode (accurately count the total token count of the final document using a real tokenizer.)
context-builder --token-count

# Add line numbers to all code blocks
context-builder --line-numbers

# Skip all confirmation prompts (auto-answer yes)
context-builder --yes

# Output only diffs (requires auto-diff & timestamped output)
context-builder --diff-only


# Clear cached project state (resets auto-diff baseline & removes stored state)

context-builder --clear-cache

# Combine multiple options for a powerful workflow
context-builder -d ./src -f rs -f toml -i tests --line-numbers -o rust_context.md
```

---

## Configuration

For more complex projects, you can use a `context-builder.toml` file in your project's root directory to store your preferences. This is great for ensuring consistent outputs and avoiding repetitive command-line flags.

### Example `context-builder.toml`

```toml
# Default output file name
output = "context.md"

# Default output folder
output_folder = "docs/context"

# Create timestamped versions of the output file (e.g., context_20250912123000.md)
timestamped_output = true

# Automatically compute per-file diffs against the previous timestamped snapshot
auto_diff = true

# Emit only change summary + modified file diffs (omit full file bodies)
# Set to true to greatly reduce token usage when you just need what's changed.
diff_only = false

# Number of context lines to show around changes in diffs (default: 3)
diff_context_lines = 5

# File extensions to include
filter = ["rs", "toml", "md"]

# Folders or file names to ignore
ignore = ["target", "node_modules", ".git"]

# Add line numbers to code blocks
line_numbers = true

# Preview mode: only show file tree without generating output
preview = false

# Token counting mode
token_count = false


# Automatically answer yes to all prompts

yes = false



# Encoding handling strategy for non-UTF-8 files

# Options: "detect" (default), "strict", "skip"

encoding_strategy = "detect"

```



 You can initialize a new configuration file using the `--init` command. This will create a `context-builder.toml` file in your current directory with sensible defaults based on the file types detected in your project. The filter suggestions will be automatically tailored to your project's most common file extensions while respecting `.gitignore` patterns and common ignore directories like `target`, `node_modules`, etc. This makes it more likely to include the files you actually want to process.



---

## Auto-diff

When using `timestamped_output = true` together with `auto_diff = true`, Context Builder compares the previous canonical snapshot to the newly generated one and produces:

- A Change Summary (Added / Removed / Modified files)
- A File Differences section containing only modified files (added & removed are summarized but not diffed)

If you also set `diff_only = true` (or pass `--diff-only`), the full “## Files” section is omitted to conserve tokens: you get just the header + tree, the Change Summary, and per-file diffs for modified files.

**Note:** Command-line arguments will always override the settings in the configuration file.

### Command Line Options

- `-d, --input <PATH>` - Directory path to process (default: current directory).
- `-o, --output <FILE>` - Output file path (default: `output.md`).
- `-f, --filter <EXT>` - File extensions to include (can be used multiple times).
- `-i, --ignore <NAME>` - Folder or file names to ignore (can be used multiple times).
- `--preview` - Preview mode: only show the file tree, don't generate output.
- `--token-count` - Token count mode: accurately count the total token count of the final document using a real tokenizer.
- `--line-numbers` - Add line numbers to code blocks in the output.
- `-y, --yes` - Automatically answer yes to all prompts (skip confirmation dialogs).
- `--diff-only` - With auto-diff + timestamped output, output only change summary + modified file diffs (omit full file bodies).
- `--clear-cache` - Remove stored state used for auto-diff; next run becomes a fresh baseline.
- `-h, --help` - Show help information.
- `-V, --version` - Show version information.
---

## Token Counting

Context Builder uses the `tiktoken-rs` library to provide accurate token counts for OpenAI models. This ensures that the token count is as close as possible to the actual number of tokens that will be used by the model.

---

## Documentation

- **[DEVELOPMENT.md](DEVELOPMENT.md):** For contributors. Covers setup, testing, linting, and release process.
- **[BENCHMARKS.md](BENCHMARKS.md):** For performance enthusiasts. Details on running benchmarks and generating datasets.
- **[CHANGELOG.md](CHANGELOG.md):** A complete history of releases and changes.

## Contributing

Contributions are welcome! Please see **[DEVELOPMENT.md](DEVELOPMENT.md)** for setup instructions and guidelines. For major changes, please open an issue first to discuss what you would like to change.

## Changelog

See **[CHANGELOG.md](CHANGELOG.md)** for a complete history of releases and changes.

## License

This project is licensed under the MIT License. See the **[LICENSE](LICENSE)** file for details.
```

### File: `src/lib.rs`

- Size: 50042 bytes
- Modified: 2026-02-15 04:11:45 UTC

```rust
use clap::{CommandFactory, Parser};

use std::fs;
use std::io::{self, Write};
use std::path::{Path, PathBuf};
use std::time::Instant;

pub mod cache;
pub mod cli;
pub mod config;
pub mod config_resolver;
pub mod diff;
pub mod file_utils;
pub mod markdown;
pub mod state;
pub mod token_count;
pub mod tree;

use std::fs::File;

use cache::CacheManager;
use cli::Args;
use config::{Config, load_config_from_path};
use diff::render_per_file_diffs;
use file_utils::{collect_files, confirm_overwrite, confirm_processing};
use markdown::generate_markdown;
use state::{ProjectState, StateComparison};
use token_count::{count_file_tokens, count_tree_tokens, estimate_tokens};
use tree::{build_file_tree, print_tree};

/// Configuration for diff operations
#[derive(Debug, Clone)]
pub struct DiffConfig {
    pub context_lines: usize,
    pub enabled: bool,
    pub diff_only: bool,
}

impl Default for DiffConfig {
    fn default() -> Self {
        Self {
            context_lines: 3,
            enabled: false,
            diff_only: false,
        }
    }
}

pub trait Prompter {
    fn confirm_processing(&self, file_count: usize) -> io::Result<bool>;
    fn confirm_overwrite(&self, file_path: &str) -> io::Result<bool>;
}

pub struct DefaultPrompter;

impl Prompter for DefaultPrompter {
    fn confirm_processing(&self, file_count: usize) -> io::Result<bool> {
        confirm_processing(file_count)
    }
    fn confirm_overwrite(&self, file_path: &str) -> io::Result<bool> {
        confirm_overwrite(file_path)
    }
}

pub fn run_with_args(args: Args, config: Config, prompter: &impl Prompter) -> io::Result<()> {
    let start_time = Instant::now();

    let silent = std::env::var("CB_SILENT")
        .map(|v| v == "1" || v.eq_ignore_ascii_case("true"))
        .unwrap_or(false);

    // Use the finalized args passed in from run()
    let final_args = args;
    // Resolve base path. If input is '.' but current working directory lost the project context
    // (no context-builder.toml), attempt to infer project root from output path (parent of 'output' dir).
    let mut resolved_base = PathBuf::from(&final_args.input);
    let cwd = std::env::current_dir().unwrap_or_else(|_| PathBuf::from("."));
    if resolved_base == Path::new(".")
        && !cwd.join("context-builder.toml").exists()
        && let Some(output_parent) = Path::new(&final_args.output).parent()
        && output_parent
            .file_name()
            .map(|n| n == "output")
            .unwrap_or(false)
        && let Some(project_root) = output_parent.parent()
        && project_root.join("context-builder.toml").exists()
    {
        resolved_base = project_root.to_path_buf();
    }
    let base_path = resolved_base.as_path();

    if !base_path.exists() || !base_path.is_dir() {
        if !silent {
            eprintln!(
                "Error: The specified input directory '{}' does not exist or is not a directory.",
                final_args.input
            );
        }
        return Err(io::Error::new(
            io::ErrorKind::NotFound,
            format!(
                "Input directory '{}' does not exist or is not a directory",
                final_args.input
            ),
        ));
    }

    // Create diff configuration from config
    let diff_config = if config.auto_diff.unwrap_or(false) {
        Some(DiffConfig {
            context_lines: config.diff_context_lines.unwrap_or(3),
            enabled: true,
            diff_only: final_args.diff_only,
        })
    } else {
        None
    };

    if !final_args.preview
        && !final_args.token_count
        && Path::new(&final_args.output).exists()
        && !final_args.yes
        && !prompter.confirm_overwrite(&final_args.output)?
    {
        if !silent {
            println!("Operation cancelled.");
        }
        return Err(io::Error::new(
            io::ErrorKind::Interrupted,
            "Operation cancelled by user",
        ));
    }

    // Compute auto-ignore patterns to exclude the tool's own output and cache
    let mut auto_ignores: Vec<String> = vec![".context-builder".to_string()];

    // Exclude the resolved output file (or its timestamped glob pattern)
    let output_path = Path::new(&final_args.output);
    if let Ok(rel_output) = output_path.strip_prefix(base_path) {
        // Output is inside the project — exclude it
        if config.timestamped_output == Some(true) {
            // Timestamped outputs: create a glob like "docs/context_*.md"
            if let (Some(parent), Some(stem), Some(ext)) = (
                rel_output.parent(),
                output_path.file_stem().and_then(|s| s.to_str()),
                output_path.extension().and_then(|s| s.to_str()),
            ) {
                // Strip the timestamp suffix to get the base stem
                // Timestamped names look like "context_20260214175028.md"
                // The stem from config is the part before the timestamp
                let base_stem = if let Some(ref cfg_output) = config.output {
                    Path::new(cfg_output)
                        .file_stem()
                        .and_then(|s| s.to_str())
                        .unwrap_or(stem)
                        .to_string()
                } else {
                    stem.to_string()
                };
                let glob = if parent == Path::new("") {
                    format!("{}_*.{}", base_stem, ext)
                } else {
                    format!("{}/{}_*.{}", parent.display(), base_stem, ext)
                };
                auto_ignores.push(glob);
            }
        } else {
            // Non-timestamped: exclude the exact output file
            auto_ignores.push(rel_output.to_string_lossy().to_string());
        }
    } else {
        // Output might be a relative path not under base_path — try using it directly
        let output_str = final_args.output.clone();
        if config.timestamped_output == Some(true) {
            if let (Some(stem), Some(ext)) = (
                output_path.file_stem().and_then(|s| s.to_str()),
                output_path.extension().and_then(|s| s.to_str()),
            ) {
                let base_stem = if let Some(ref cfg_output) = config.output {
                    Path::new(cfg_output)
                        .file_stem()
                        .and_then(|s| s.to_str())
                        .unwrap_or(stem)
                        .to_string()
                } else {
                    stem.to_string()
                };
                if let Some(parent) = output_path.parent() {
                    let parent_str = parent.to_string_lossy();
                    if parent_str.is_empty() || parent_str == "." {
                        auto_ignores.push(format!("{}_*.{}", base_stem, ext));
                    } else {
                        auto_ignores.push(format!("{}/{}_*.{}", parent_str, base_stem, ext));
                    }
                }
            }
        } else {
            auto_ignores.push(output_str);
        }
    }

    // Also exclude the output folder itself if configured
    if let Some(ref output_folder) = config.output_folder {
        auto_ignores.push(output_folder.clone());
    }

    let files = collect_files(
        base_path,
        &final_args.filter,
        &final_args.ignore,
        &auto_ignores,
    )?;
    let debug_config = std::env::var("CB_DEBUG_CONFIG").is_ok();
    if debug_config {
        eprintln!("[DEBUG][CONFIG] Args: {:?}", final_args);
        eprintln!("[DEBUG][CONFIG] Raw Config: {:?}", config);
        eprintln!("[DEBUG][CONFIG] Auto-ignores: {:?}", auto_ignores);
        eprintln!("[DEBUG][CONFIG] Collected {} files", files.len());
        for f in &files {
            eprintln!("[DEBUG][CONFIG]  - {}", f.path().display());
        }
    }

    // Smart large-file detection: warn about files that may bloat the context
    if !silent {
        const LARGE_FILE_THRESHOLD: u64 = 100 * 1024; // 100 KB
        let mut large_files: Vec<(String, u64)> = Vec::new();
        let mut total_size: u64 = 0;

        for entry in &files {
            if let Ok(metadata) = entry.path().metadata() {
                let size = metadata.len();
                total_size += size;
                if size > LARGE_FILE_THRESHOLD {
                    let rel_path = entry
                        .path()
                        .strip_prefix(base_path)
                        .unwrap_or(entry.path())
                        .to_string_lossy()
                        .to_string();
                    large_files.push((rel_path, size));
                }
            }
        }

        if !large_files.is_empty() {
            large_files.sort_by(|a, b| b.1.cmp(&a.1)); // Sort by size descending
            eprintln!(
                "\n⚠  {} large file(s) detected (>{} KB):",
                large_files.len(),
                LARGE_FILE_THRESHOLD / 1024
            );
            for (path, size) in large_files.iter().take(5) {
                eprintln!("   {:>8} KB  {}", size / 1024, path);
            }
            if large_files.len() > 5 {
                eprintln!("   ... and {} more", large_files.len() - 5);
            }
            eprintln!(
                "   Total context size: {} KB across {} files\n",
                total_size / 1024,
                files.len()
            );
        }
    }
    let file_tree = build_file_tree(&files, base_path);

    if final_args.preview {
        if !silent {
            println!("\n# File Tree Structure (Preview)\n");
            print_tree(&file_tree, 0);
        }
        if !final_args.token_count {
            return Ok(());
        }
    }

    if final_args.token_count {
        if !silent {
            println!("\n# Token Count Estimation\n");
            let mut total_tokens = 0;
            total_tokens += estimate_tokens("# Directory Structure Report\n\n");
            if !final_args.filter.is_empty() {
                total_tokens += estimate_tokens(&format!(
                    "This document contains files from the `{}` directory with extensions: {} \n",
                    final_args.input,
                    final_args.filter.join(", ")
                ));
            } else {
                total_tokens += estimate_tokens(&format!(
                    "This document contains all files from the `{}` directory, optimized for LLM consumption.\n",
                    final_args.input
                ));
            }
            if !final_args.ignore.is_empty() {
                total_tokens += estimate_tokens(&format!(
                    "Custom ignored patterns: {} \n",
                    final_args.ignore.join(", ")
                ));
            }
            total_tokens += estimate_tokens("Content hash: 0000000000000000\n\n");
            total_tokens += estimate_tokens("## File Tree Structure\n\n");
            let tree_tokens = count_tree_tokens(&file_tree, 0);
            total_tokens += tree_tokens;
            let file_tokens: usize = files
                .iter()
                .map(|entry| count_file_tokens(base_path, entry, final_args.line_numbers))
                .sum();
            total_tokens += file_tokens;
            println!("Estimated total tokens: {}", total_tokens);
            println!("File tree tokens: {}", tree_tokens);
            println!("File content tokens: {}", file_tokens);
        }
        return Ok(());
    }

    if !final_args.yes && !prompter.confirm_processing(files.len())? {
        if !silent {
            println!("Operation cancelled.");
        }
        return Err(io::Error::new(
            io::ErrorKind::Interrupted,
            "Operation cancelled by user",
        ));
    }

    // NOTE: config-driven flags (line_numbers, diff_only) are already merged
    // by config_resolver.rs with proper CLI-takes-precedence semantics.
    // Do NOT re-apply them here as that would silently overwrite CLI flags.

    if config.auto_diff.unwrap_or(false) {
        // Build an effective config that mirrors the *actual* operational settings coming
        // from resolved CLI args (filters/ignores/line_numbers). This ensures the
        // configuration hash used for cache invalidation reflects real behavior and
        // stays consistent across runs even when values originate from CLI not file.
        let mut effective_config = config.clone();
        // Normalize filter/ignore/line_numbers into config so hashing sees them
        if !final_args.filter.is_empty() {
            effective_config.filter = Some(final_args.filter.clone());
        }
        if !final_args.ignore.is_empty() {
            effective_config.ignore = Some(final_args.ignore.clone());
        }
        effective_config.line_numbers = Some(final_args.line_numbers);

        // 1. Create current project state
        let current_state = ProjectState::from_files(
            &files,
            base_path,
            &effective_config,
            final_args.line_numbers,
        )?;

        // 2. Initialize cache manager and load previous state
        let cache_manager = CacheManager::new(base_path, &effective_config);
        let previous_state = match cache_manager.read_cache() {
            Ok(state) => state,
            Err(e) => {
                if !silent {
                    eprintln!(
                        "Warning: Failed to read cache (proceeding without diff): {}",
                        e
                    );
                }
                None
            }
        };

        let diff_cfg = diff_config.as_ref().unwrap();

        // 3. Determine whether we should invalidate (ignore) previous state
        let effective_previous = if let Some(prev) = previous_state.as_ref() {
            if prev.config_hash != current_state.config_hash {
                // Config change => treat as initial state (invalidate diff)
                None
            } else {
                Some(prev)
            }
        } else {
            None
        };

        // 4. Compare states and generate diff if an effective previous state exists
        let comparison = effective_previous.map(|prev| current_state.compare_with(prev));

        let debug_autodiff = std::env::var("CB_DEBUG_AUTODIFF").is_ok();
        if debug_autodiff {
            eprintln!(
                "[DEBUG][AUTODIFF] cache file: {}",
                cache_manager.debug_cache_file_path().display()
            );
            eprintln!(
                "[DEBUG][AUTODIFF] config_hash current={} prev={:?} invalidated={}",
                current_state.config_hash,
                previous_state.as_ref().map(|s| s.config_hash.clone()),
                effective_previous.is_none() && previous_state.is_some()
            );
            eprintln!("[DEBUG][AUTODIFF] effective_config: {:?}", effective_config);
            if let Some(prev) = previous_state.as_ref() {
                eprintln!("[DEBUG][AUTODIFF] raw previous files: {}", prev.files.len());
            }
            if let Some(prev) = effective_previous {
                eprintln!(
                    "[DEBUG][AUTODIFF] effective previous files: {}",
                    prev.files.len()
                );
                for k in prev.files.keys() {
                    eprintln!("  PREV: {}", k.display());
                }
            }
            eprintln!(
                "[DEBUG][AUTODIFF] current files: {}",
                current_state.files.len()
            );
            for k in current_state.files.keys() {
                eprintln!("  CURR: {}", k.display());
            }
        }

        // Build relevance-sorted path list from the DirEntry list (which is
        // already sorted by file_relevance_category). This preserves ordering
        // instead of using BTreeMap's alphabetical iteration.
        // IMPORTANT: Path resolution must match state.rs to avoid get() misses.
        let cwd = std::env::current_dir().unwrap_or_else(|_| base_path.to_path_buf());
        let sorted_paths: Vec<PathBuf> = files
            .iter()
            .map(|entry| {
                entry
                    .path()
                    .strip_prefix(base_path)
                    .or_else(|_| entry.path().strip_prefix(&cwd))
                    .map(|p| p.to_path_buf())
                    .unwrap_or_else(|_| {
                        entry
                            .path()
                            .file_name()
                            .map(PathBuf::from)
                            .unwrap_or_else(|| entry.path().to_path_buf())
                    })
            })
            .collect();

        // 4. Generate markdown with diff annotations
        let mut final_doc = generate_markdown_with_diff(
            &current_state,
            comparison.as_ref(),
            &final_args,
            &file_tree,
            diff_cfg,
            &sorted_paths,
        )?;

        // Enforce max_tokens budget (same ~4 bytes/token heuristic as parallel path)
        if let Some(max_tokens) = final_args.max_tokens {
            let max_bytes = max_tokens * 4;
            if final_doc.len() > max_bytes {
                // Truncate at a valid UTF-8 boundary
                let mut truncate_at = max_bytes;
                while truncate_at > 0 && !final_doc.is_char_boundary(truncate_at) {
                    truncate_at -= 1;
                }
                final_doc.truncate(truncate_at);
                final_doc.push_str("\n\n---\n\n");
                final_doc.push_str(&format!(
                    "_Output truncated: exceeded {} token budget (estimated)._\n",
                    max_tokens
                ));
            }
        }

        // 5. Write output
        let output_path = Path::new(&final_args.output);
        if let Some(parent) = output_path.parent()
            && !parent.exists()
            && let Err(e) = fs::create_dir_all(parent)
        {
            return Err(io::Error::other(format!(
                "Failed to create output directory {}: {}",
                parent.display(),
                e
            )));
        }
        let mut final_output = fs::File::create(output_path)?;
        final_output.write_all(final_doc.as_bytes())?;

        // 6. Update cache with current state
        if let Err(e) = cache_manager.write_cache(&current_state)
            && !silent
        {
            eprintln!("Warning: failed to update state cache: {}", e);
        }

        let duration = start_time.elapsed();
        if !silent {
            if let Some(comp) = &comparison {
                if comp.summary.has_changes() {
                    println!(
                        "Documentation created successfully with {} changes: {}",
                        comp.summary.total_changes, final_args.output
                    );
                } else {
                    println!(
                        "Documentation created successfully (no changes detected): {}",
                        final_args.output
                    );
                }
            } else {
                println!(
                    "Documentation created successfully (initial state): {}",
                    final_args.output
                );
            }
            println!("Processing time: {:.2?}", duration);

            // Warn about context window overflow
            let output_bytes = final_doc.len();
            print_context_window_warning(output_bytes, final_args.max_tokens);
        }
        return Ok(());
    }

    // Standard (non auto-diff) generation
    generate_markdown(
        &final_args.output,
        &final_args.input,
        &final_args.filter,
        &final_args.ignore,
        &file_tree,
        &files,
        base_path,
        final_args.line_numbers,
        config.encoding_strategy.as_deref(),
        final_args.max_tokens,
    )?;

    let duration = start_time.elapsed();
    if !silent {
        println!("Documentation created successfully: {}", final_args.output);
        println!("Processing time: {:.2?}", duration);

        // Warn about context window overflow
        let output_bytes = fs::metadata(&final_args.output)
            .map(|m| m.len() as usize)
            .unwrap_or(0);
        print_context_window_warning(output_bytes, final_args.max_tokens);
    }

    Ok(())
}

/// Print context window overflow warnings with actionable recommendations.
/// Estimates tokens using the ~4 bytes/token heuristic. Warns when output
/// exceeds 128K tokens — beyond this size, context quality degrades
/// significantly for most LLM use cases.
fn print_context_window_warning(output_bytes: usize, max_tokens: Option<usize>) {
    let estimated_tokens = output_bytes / 4;

    println!("Estimated tokens: ~{}K", estimated_tokens / 1000);

    // If the user already set --max-tokens, they're managing their budget
    if max_tokens.is_some() {
        return;
    }

    const RECOMMENDED_LIMIT: usize = 128_000;

    if estimated_tokens <= RECOMMENDED_LIMIT {
        return;
    }

    eprintln!();
    eprintln!(
        "⚠️  Output is ~{}K tokens — recommended limit is 128K for effective LLM context.",
        estimated_tokens / 1000
    );
    eprintln!("   Large contexts degrade response quality. Consider narrowing the scope:");
    eprintln!();
    eprintln!("   • --max-tokens 100000    Cap output to a token budget");
    eprintln!("   • --filter rs,toml       Include only specific file types");
    eprintln!("   • --ignore docs,assets   Exclude directories by name");
    eprintln!("   • --token-count          Preview size without generating");
    eprintln!();
}

/// Generate markdown document with diff annotations
fn generate_markdown_with_diff(
    current_state: &ProjectState,
    comparison: Option<&StateComparison>,
    args: &Args,
    file_tree: &tree::FileTree,
    diff_config: &DiffConfig,
    sorted_paths: &[PathBuf],
) -> io::Result<String> {
    let mut output = String::new();

    // Header
    output.push_str("# Directory Structure Report\n\n");

    // Basic project info
    output.push_str(&format!(
        "**Project:** {}\n",
        current_state.metadata.project_name
    ));
    output.push_str(&format!("**Generated:** {}\n", current_state.timestamp));

    if !args.filter.is_empty() {
        output.push_str(&format!("**Filters:** {}\n", args.filter.join(", ")));
    }

    if !args.ignore.is_empty() {
        output.push_str(&format!("**Ignored:** {}\n", args.ignore.join(", ")));
    }

    output.push('\n');

    // Change summary + sections if we have a comparison
    if let Some(comp) = comparison {
        if comp.summary.has_changes() {
            output.push_str(&comp.summary.to_markdown());

            // Collect added files once so we can reuse for both diff_only logic and potential numbering.
            let added_files: Vec<_> = comp
                .file_diffs
                .iter()
                .filter(|d| matches!(d.status, diff::PerFileStatus::Added))
                .collect();

            if diff_config.diff_only && !added_files.is_empty() {
                output.push_str("## Added Files\n\n");
                for added in added_files {
                    output.push_str(&format!("### File: `{}`\n\n", added.path));
                    output.push_str("_Status: Added_\n\n");
                    // Reconstruct content from + lines.
                    let mut lines: Vec<String> = Vec::new();
                    for line in added.diff.lines() {
                        // Diff output uses "+ " prefix (plus-space), strip both to reconstruct content.
                        // Previously strip_prefix('+') left a leading space, corrupting indentation.
                        if let Some(rest) = line.strip_prefix("+ ") {
                            lines.push(rest.to_string());
                        } else if let Some(rest) = line.strip_prefix('+') {
                            // Handle edge case: empty added lines have just "+"
                            lines.push(rest.to_string());
                        }
                    }
                    output.push_str("```text\n");
                    if args.line_numbers {
                        for (idx, l) in lines.iter().enumerate() {
                            output.push_str(&format!("{:>4} | {}\n", idx + 1, l));
                        }
                    } else {
                        for l in lines {
                            output.push_str(&l);
                            output.push('\n');
                        }
                    }
                    output.push_str("```\n\n");
                }
            }

            // Always include a unified diff section header so downstream tooling/tests can rely on it
            let changed_diffs: Vec<diff::PerFileDiff> = comp
                .file_diffs
                .iter()
                .filter(|d| d.is_changed())
                .cloned()
                .collect();
            if !changed_diffs.is_empty() {
                output.push_str("## File Differences\n\n");
                let diff_markdown = render_per_file_diffs(&changed_diffs);
                output.push_str(&diff_markdown);
            }
        } else {
            output.push_str("## No Changes Detected\n\n");
        }
    }

    // File tree
    output.push_str("## File Tree Structure\n\n");
    let mut tree_output = Vec::new();
    tree::write_tree_to_file(&mut tree_output, file_tree, 0)?;
    output.push_str(&String::from_utf8_lossy(&tree_output));
    output.push('\n');

    // File contents (unless diff_only mode)
    if !diff_config.diff_only {
        output.push_str("## File Contents\n\n");

        // Iterate in relevance order (from sorted_paths) instead of
        // BTreeMap's alphabetical order — preserves file_relevance_category ordering.
        for path in sorted_paths {
            if let Some(file_state) = current_state.files.get(path) {
                output.push_str(&format!("### File: `{}`\n\n", path.display()));
                output.push_str(&format!("- Size: {} bytes\n", file_state.size));
                output.push_str(&format!("- Modified: {:?}\n\n", file_state.modified));

                // Determine language from file extension
                let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("text");
                let language = match extension {
                    "rs" => "rust",
                    "js" => "javascript",
                    "ts" => "typescript",
                    "py" => "python",
                    "json" => "json",
                    "toml" => "toml",
                    "md" => "markdown",
                    "yaml" | "yml" => "yaml",
                    "html" => "html",
                    "css" => "css",
                    _ => extension,
                };

                output.push_str(&format!("```{}\n", language));

                if args.line_numbers {
                    for (i, line) in file_state.content.lines().enumerate() {
                        output.push_str(&format!("{:>4} | {}\n", i + 1, line));
                    }
                } else {
                    output.push_str(&file_state.content);
                    if !file_state.content.ends_with('\n') {
                        output.push('\n');
                    }
                }

                output.push_str("```\n\n");
            }
        }
    }

    Ok(output)
}

pub fn run() -> io::Result<()> {
    env_logger::init();
    let args = Args::parse();

    // Handle init command first
    if args.init {
        return init_config();
    }

    // Determine project root first
    let project_root = Path::new(&args.input);
    let config = load_config_from_path(project_root);

    // Handle early clear-cache request (runs even if no config or other args)
    if args.clear_cache {
        let cache_path = project_root.join(".context-builder").join("cache");
        if cache_path.exists() {
            match fs::remove_dir_all(&cache_path) {
                Ok(()) => println!("Cache cleared: {}", cache_path.display()),
                Err(e) => eprintln!("Failed to clear cache ({}): {}", cache_path.display(), e),
            }
        } else {
            println!("No cache directory found at {}", cache_path.display());
        }
        return Ok(());
    }

    if std::env::args().len() == 1 && config.is_none() {
        Args::command().print_help()?;
        return Ok(());
    }

    // Resolve final configuration using the new config resolver
    let resolution = crate::config_resolver::resolve_final_config(args, config.clone());

    // Print warnings if any
    let silent = std::env::var("CB_SILENT")
        .map(|v| v == "1" || v.eq_ignore_ascii_case("true"))
        .unwrap_or(false);

    if !silent {
        for warning in &resolution.warnings {
            eprintln!("Warning: {}", warning);
        }
    }

    // Convert resolved config back to Args for run_with_args
    let final_args = Args {
        input: resolution.config.input,
        output: resolution.config.output,
        filter: resolution.config.filter,
        ignore: resolution.config.ignore,
        line_numbers: resolution.config.line_numbers,
        preview: resolution.config.preview,
        token_count: resolution.config.token_count,
        yes: resolution.config.yes,
        diff_only: resolution.config.diff_only,
        clear_cache: resolution.config.clear_cache,
        max_tokens: resolution.config.max_tokens,
        init: false,
    };

    // Create final Config with resolved values
    let final_config = Config {
        auto_diff: Some(resolution.config.auto_diff),
        diff_context_lines: Some(resolution.config.diff_context_lines),
        ..config.unwrap_or_default()
    };

    run_with_args(final_args, final_config, &DefaultPrompter)
}

/// Detect major file types in the current directory respecting .gitignore and default ignore patterns
fn detect_major_file_types() -> io::Result<Vec<String>> {
    use std::collections::HashMap;
    let mut extension_counts = HashMap::new();

    // Use the same default ignore patterns as the main application
    let default_ignores = vec![
        "docs".to_string(),
        "target".to_string(),
        ".git".to_string(),
        "node_modules".to_string(),
    ];

    // Collect files using the same logic as the main application
    let files = crate::file_utils::collect_files(Path::new("."), &[], &default_ignores, &[])?;

    // Count extensions from the filtered file list
    for entry in files {
        let path = entry.path();
        if let Some(extension) = path.extension().and_then(|ext| ext.to_str()) {
            // Count the extension occurrences
            *extension_counts.entry(extension.to_string()).or_insert(0) += 1;
        }
    }

    // Convert to vector of (extension, count) pairs and sort by count
    let mut extensions: Vec<(String, usize)> = extension_counts.into_iter().collect();
    extensions.sort_by(|a, b| b.1.cmp(&a.1));

    // Take the top 5 extensions or all if less than 5
    let top_extensions: Vec<String> = extensions.into_iter().take(5).map(|(ext, _)| ext).collect();

    Ok(top_extensions)
}

/// Initialize a new context-builder.toml config file in the current directory with sensible defaults
fn init_config() -> io::Result<()> {
    let config_path = Path::new("context-builder.toml");

    if config_path.exists() {
        println!("Config file already exists at {}", config_path.display());
        println!("If you want to replace it, please remove it manually first.");
        return Ok(());
    }

    // Detect major file types in the current directory
    let filter_suggestions = match detect_major_file_types() {
        Ok(extensions) => extensions,
        _ => vec!["rs".to_string(), "toml".to_string()], // fallback to defaults
    };

    let filter_string = if filter_suggestions.is_empty() {
        r#"["rs", "toml"]"#.to_string()
    } else {
        format!(r#"["{}"]"#, filter_suggestions.join(r#"", ""#))
    };

    let default_config_content = format!(
        r#"# Context Builder Configuration File
# This file was generated with sensible defaults based on the file types detected in your project

# Output file name (or base name when timestamped_output is true)
output = "context.md"

# Optional folder to place the generated output file(s) in
output_folder = "docs"

# Append a UTC timestamp to the output file name (before extension)
timestamped_output = true

# Enable automatic diff generation (requires timestamped_output = true)
auto_diff = true

# Emit only change summary + modified file diffs (no full file bodies)
diff_only = false

# File extensions to include (no leading dot, e.g. "rs", "toml")
filter = {}

# File / directory names to ignore (exact name matches)
ignore = ["docs", "target", ".git", "node_modules"]

# Add line numbers to code blocks
line_numbers = false
"#,
        filter_string
    );

    let mut file = File::create(config_path)?;
    file.write_all(default_config_content.as_bytes())?;

    println!("Config file created at {}", config_path.display());
    println!("Detected file types: {}", filter_suggestions.join(", "));
    println!("You can now customize it according to your project needs.");

    Ok(())
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::io::Result;
    use tempfile::tempdir;

    // Mock prompter for testing
    struct MockPrompter {
        confirm_processing_response: bool,
        confirm_overwrite_response: bool,
    }

    impl MockPrompter {
        fn new(processing: bool, overwrite: bool) -> Self {
            Self {
                confirm_processing_response: processing,
                confirm_overwrite_response: overwrite,
            }
        }
    }

    impl Prompter for MockPrompter {
        fn confirm_processing(&self, _file_count: usize) -> Result<bool> {
            Ok(self.confirm_processing_response)
        }

        fn confirm_overwrite(&self, _file_path: &str) -> Result<bool> {
            Ok(self.confirm_overwrite_response)
        }
    }

    #[test]
    fn test_diff_config_default() {
        let config = DiffConfig::default();
        assert_eq!(config.context_lines, 3);
        assert!(!config.enabled);
        assert!(!config.diff_only);
    }

    #[test]
    fn test_diff_config_custom() {
        let config = DiffConfig {
            context_lines: 5,
            enabled: true,
            diff_only: true,
        };
        assert_eq!(config.context_lines, 5);
        assert!(config.enabled);
        assert!(config.diff_only);
    }

    #[test]
    fn test_default_prompter() {
        let prompter = DefaultPrompter;

        // Test small file count (should not prompt)
        let result = prompter.confirm_processing(50);
        assert!(result.is_ok());
        assert!(result.unwrap());
    }

    #[test]
    fn test_run_with_args_nonexistent_directory() {
        let args = Args {
            input: "/nonexistent/directory".to_string(),
            output: "output.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        let result = run_with_args(args, config, &prompter);
        assert!(result.is_err());
        assert!(result.unwrap_err().to_string().contains("does not exist"));
    }

    #[test]
    fn test_run_with_args_preview_mode() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        // Create some test files
        fs::write(base_path.join("test.rs"), "fn main() {}").unwrap();
        fs::create_dir(base_path.join("src")).unwrap();
        fs::write(base_path.join("src/lib.rs"), "pub fn hello() {}").unwrap();

        let args = Args {
            input: ".".to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        // Set CB_SILENT to avoid console output during test
        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());
    }

    #[test]
    fn test_run_with_args_token_count_mode() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        // Create test files
        fs::write(base_path.join("small.txt"), "Hello world").unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: true,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());
    }

    #[test]
    fn test_run_with_args_preview_and_token_count() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        fs::write(base_path.join("test.txt"), "content").unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: true,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());
    }

    #[test]
    fn test_run_with_args_user_cancels_overwrite() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        let output_path = temp_dir.path().join("existing.md");

        // Create test files
        fs::write(base_path.join("test.txt"), "content").unwrap();
        fs::write(&output_path, "existing content").unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec!["target".to_string()],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, false); // Deny overwrite

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_err());
        assert!(result.unwrap_err().to_string().contains("cancelled"));
    }

    #[test]
    fn test_run_with_args_user_cancels_processing() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        // Create many test files to trigger processing confirmation
        for i in 0..105 {
            fs::write(base_path.join(format!("file{}.txt", i)), "content").unwrap();
        }

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: "test.md".to_string(),
            filter: vec!["rs".to_string()],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(false, true); // Deny processing

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_err());
        assert!(result.unwrap_err().to_string().contains("cancelled"));
    }

    #[test]
    fn test_run_with_args_with_yes_flag() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        let output_file_name = "test.md";
        let output_path = temp_dir.path().join(output_file_name);

        fs::write(base_path.join("test.txt"), "Hello world").unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: output_path.to_string_lossy().to_string(),
            filter: vec![],
            ignore: vec!["ignored_dir".to_string()],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());
        assert!(output_path.exists());

        let content = fs::read_to_string(&output_path).unwrap();
        assert!(content.contains("Directory Structure Report"));
        assert!(content.contains("test.txt"));
    }

    #[test]
    fn test_run_with_args_with_filters() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        let output_file_name = "test.md";
        let output_path = temp_dir.path().join(output_file_name);

        fs::write(base_path.join("code.rs"), "fn main() {}").unwrap();
        fs::write(base_path.join("readme.md"), "# README").unwrap();
        fs::write(base_path.join("data.json"), r#"{"key": "value"}"#).unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: output_path.to_string_lossy().to_string(),
            filter: vec!["rs".to_string(), "md".to_string()],
            ignore: vec![],
            line_numbers: true,
            preview: false,
            token_count: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());

        let content = fs::read_to_string(&output_path).unwrap();
        assert!(content.contains("code.rs"));
        assert!(content.contains("readme.md"));
        assert!(!content.contains("data.json")); // Should be filtered out
        assert!(content.contains("   1 |")); // Line numbers should be present
    }

    #[test]
    fn test_run_with_args_with_ignores() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        let output_path = temp_dir.path().join("ignored.md");

        fs::write(base_path.join("important.txt"), "important content").unwrap();
        fs::write(base_path.join("secret.txt"), "secret content").unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: output_path.to_string_lossy().to_string(),
            filter: vec![],
            ignore: vec!["secret.txt".to_string()],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());

        let content = fs::read_to_string(&output_path).unwrap();
        assert!(content.contains("important.txt"));
        // The ignore pattern may not work exactly as expected in this test setup
        // Just verify the output file was created successfully
    }

    #[test]
    fn test_auto_diff_without_previous_state() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        let output_file_name = "test.md";
        let output_path = temp_dir.path().join(output_file_name);

        fs::write(base_path.join("new.txt"), "new content").unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: output_path.to_string_lossy().to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config {
            auto_diff: Some(true),
            diff_context_lines: Some(5),
            ..Default::default()
        };
        let prompter = MockPrompter::new(true, true);

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());
        assert!(output_path.exists());

        let content = fs::read_to_string(&output_path).unwrap();
        assert!(content.contains("new.txt"));
    }

    #[test]
    fn test_run_creates_output_directory() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        let output_dir = temp_dir.path().join("nested").join("output");
        let output_path = output_dir.join("result.md");

        fs::write(base_path.join("test.txt"), "content").unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: output_path.to_string_lossy().to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };
        let config = Config::default();
        let prompter = MockPrompter::new(true, true);

        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
        let result = run_with_args(args, config, &prompter);
        unsafe {
            std::env::remove_var("CB_SILENT");
        }

        assert!(result.is_ok());
        assert!(output_path.exists());
        assert!(output_dir.exists());
    }

    #[test]
    fn test_generate_markdown_with_diff_no_comparison() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        fs::write(base_path.join("test.rs"), "fn main() {}").unwrap();

        let files = collect_files(base_path, &[], &[], &[]).unwrap();
        let file_tree = build_file_tree(&files, base_path);
        let config = Config::default();
        let state = ProjectState::from_files(&files, base_path, &config, false).unwrap();

        let args = Args {
            input: base_path.to_string_lossy().to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let diff_config = DiffConfig::default();

        let sorted_paths: Vec<PathBuf> = files
            .iter()
            .map(|e| {
                e.path()
                    .strip_prefix(base_path)
                    .unwrap_or(e.path())
                    .to_path_buf()
            })
            .collect();

        let result = generate_markdown_with_diff(
            &state,
            None,
            &args,
            &file_tree,
            &diff_config,
            &sorted_paths,
        );
        assert!(result.is_ok());

        let content = result.unwrap();
        assert!(content.contains("Directory Structure Report"));
        assert!(content.contains("test.rs"));
    }
}
```

### File: `src/main.rs`

- Size: 73 bytes
- Modified: 2026-02-14 07:14:48 UTC

```rust
use std::io;

fn main() -> io::Result<()> {
    context_builder::run()
}
```

### File: `src/cache.rs`

- Size: 19309 bytes
- Modified: 2026-02-14 22:08:51 UTC

```rust
//! Cache management for context-builder.
//!
//! This module handles caching of project states to enable the auto-diff feature.
//! It uses a hash of the project path and configuration to avoid cache collisions
//! between different projects or configurations.

use fs2::FileExt;

use std::fs;
use std::fs::File;

use std::io::{Read, Write};
use std::path::{Path, PathBuf};

use crate::config::Config;
use crate::state::ProjectState;

/// Manages cache operations with file locking to prevent corruption
pub struct CacheManager {
    cache_dir: PathBuf,
    project_hash: String,
    config_hash: String,
}

impl CacheManager {
    /// Create a new cache manager for the given project path and configuration
    pub fn new(project_path: &Path, config: &Config) -> Self {
        // Normalize the project path first for consistency
        let normalized_project_path = Self::normalize_project_path(project_path);

        let project_hash = Self::hash_path(&normalized_project_path);
        let config_hash = Self::hash_config(config);

        // Ensure cache directory exists relative to normalized project root
        let cache_dir = normalized_project_path
            .join(".context-builder")
            .join("cache");
        if !cache_dir.exists() {
            let _ = fs::create_dir_all(&cache_dir);
        }

        let cache_manager = Self {
            cache_dir,
            project_hash,
            config_hash,
        };

        // Migrate old cache format if present
        cache_manager.migrate_old_cache();

        cache_manager
    }

    /// Normalize project path for consistent hashing and cache directory creation
    fn normalize_project_path(path: &Path) -> PathBuf {
        // Always resolve to absolute path first
        let absolute_path = if path.is_absolute() {
            path.to_path_buf()
        } else {
            match std::env::current_dir() {
                Ok(cwd) => cwd.join(path),
                Err(_) => path.to_path_buf(),
            }
        };

        // Try to canonicalize for consistency, but normalize the result
        if let Ok(canonical) = absolute_path.canonicalize() {
            Self::normalize_path_format(&canonical)
        } else {
            absolute_path
        }
    }

    /// Generate a hash from the normalized project path
    fn hash_path(path: &Path) -> String {
        let path_str = path.to_string_lossy();
        let hash = xxhash_rust::xxh3::xxh3_64(path_str.as_bytes());
        format!("{:x}", hash)
    }

    /// Normalize path format to handle Windows UNC prefixes
    fn normalize_path_format(path: &Path) -> PathBuf {
        let path_str = path.to_string_lossy();

        // Remove Windows UNC prefix if present
        if cfg!(windows) && path_str.starts_with("\\\\?\\") {
            PathBuf::from(&path_str[4..])
        } else {
            path.to_path_buf()
        }
    }

    /// Generate a hash from the configuration
    fn hash_config(config: &Config) -> String {
        // Build a stable string representation of config for hashing
        let mut config_str = String::new();
        if let Some(ref filters) = config.filter {
            config_str.push_str(&filters.join(","));
        }
        config_str.push('|');
        if let Some(ref ignores) = config.ignore {
            config_str.push_str(&ignores.join(","));
        }
        config_str.push('|');
        config_str.push_str(&format!("{:?}", config.line_numbers));
        let hash = xxhash_rust::xxh3::xxh3_64(config_str.as_bytes());
        format!("{:x}", hash)
    }

    /// Get the cache file path for this specific project and configuration
    fn get_cache_path(&self) -> PathBuf {
        self.cache_dir.join(format!(
            "state_{}_{}.json",
            self.project_hash, self.config_hash
        ))
    }

    /// Public helper primarily for debugging/tests to inspect the resolved cache path
    pub fn debug_cache_file_path(&self) -> PathBuf {
        self.get_cache_path()
    }

    /// Migrate old markdown-based cache files to new JSON format
    fn migrate_old_cache(&self) {
        let old_cache_patterns = ["last_canonical.md", "last_output.md", "current_output.md"];

        for pattern in &old_cache_patterns {
            let old_cache_path = self.cache_dir.join(pattern);
            if old_cache_path.exists() {
                eprintln!("Migrating old cache format: removing {}", pattern);
                let _ = fs::remove_file(&old_cache_path);
            }
        }

        // Also remove any files that look like timestamped outputs from old versions
        if let Ok(entries) = fs::read_dir(&self.cache_dir) {
            for entry in entries.flatten() {
                let file_name = entry.file_name();
                let name = file_name.to_string_lossy();
                if name.ends_with(".md") && (name.contains("_20") || name.starts_with("output_")) {
                    eprintln!("Migrating old cache format: removing {}", name);
                    let _ = fs::remove_file(entry.path());
                }
            }
        }
    }

    /// Read the cached project state with file locking
    pub fn read_cache(&self) -> Result<Option<ProjectState>, Box<dyn std::error::Error>> {
        let cache_path = self.get_cache_path();

        if !cache_path.exists() {
            return Ok(None);
        }

        let file = File::open(&cache_path)?;
        // Acquire shared lock to prevent reading while writing
        file.lock_shared()?;

        let mut contents = String::new();
        let mut file = std::io::BufReader::new(file);
        file.read_to_string(&mut contents)?;

        // Release lock
        file.get_ref().unlock()?;

        let state: ProjectState = serde_json::from_str(&contents)?;
        Ok(Some(state))
    }

    /// Write the project state to cache with file locking
    pub fn write_cache(&self, state: &ProjectState) -> Result<(), Box<dyn std::error::Error>> {
        let cache_path = self.get_cache_path();

        let file = std::fs::OpenOptions::new()
            .write(true)
            .create(true)
            .truncate(false)
            .open(&cache_path)?;
        // Acquire exclusive lock BEFORE truncating to prevent TOCTOU races
        file.lock_exclusive()?;
        file.set_len(0)?;

        let json = serde_json::to_string_pretty(state)?;
        let mut file = std::io::BufWriter::new(file);
        file.write_all(json.as_bytes())?;
        file.flush()?;

        // Release lock
        file.get_ref().unlock()?;

        Ok(())
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::path::Path;
    use tempfile::tempdir;

    #[test]
    fn test_hash_path() {
        let path1 = Path::new("/project1");
        let path2 = Path::new("/project2");

        let hash1 = CacheManager::hash_path(path1);
        let hash2 = CacheManager::hash_path(path2);

        assert_ne!(
            hash1, hash2,
            "Different paths should produce different hashes"
        );
    }

    #[test]
    fn test_hash_config() {
        let config1 = Config {
            filter: Some(vec!["rs".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(true),
            ..Default::default()
        };

        let config2 = Config {
            filter: Some(vec!["md".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(true),
            ..Default::default()
        };

        let hash1 = CacheManager::hash_config(&config1);
        let hash2 = CacheManager::hash_config(&config2);

        assert_ne!(
            hash1, hash2,
            "Different configs should produce different hashes"
        );
    }

    #[test]
    fn test_cache_operations() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        let config = Config::default();
        let cache_manager = CacheManager::new(&project_path, &config);

        use crate::state::ProjectMetadata;

        let state = ProjectState {
            timestamp: "2023-01-01T00:00:00Z".to_string(),
            config_hash: "test_config_hash".to_string(),
            files: std::collections::BTreeMap::new(),
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 0,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        // Write cache
        assert!(cache_manager.write_cache(&state).is_ok());

        // Read cache
        let cached_state = cache_manager.read_cache().unwrap();
        assert!(cached_state.is_some());
        assert_eq!(cached_state.unwrap().timestamp, state.timestamp);
    }

    #[test]
    fn test_old_cache_migration() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        // Create cache directory with old cache files
        let cache_dir = project_path.join(".context-builder").join("cache");
        let _ = fs::create_dir_all(&cache_dir);

        let old_files = [
            "last_canonical.md",
            "last_output.md",
            "current_output.md",
            "output_20230101120000.md",
        ];

        // Create old cache files
        for file in &old_files {
            let old_path = cache_dir.join(file);
            let _ = fs::write(&old_path, "old cache content");
            assert!(
                old_path.exists(),
                "Old cache file should exist before migration"
            );
        }

        // Create cache manager (this should trigger migration)
        let config = Config::default();
        let _cache_manager = CacheManager::new(&project_path, &config);

        // Verify old files are removed
        for file in &old_files {
            let old_path = cache_dir.join(file);
            assert!(
                !old_path.exists(),
                "Old cache file {} should be removed after migration",
                file
            );
        }
    }

    #[test]
    fn test_cache_consistency_across_path_representations() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        let config = Config::default();

        // Test different path representations that should resolve to the same cache
        let mut paths_to_test = vec![
            project_path.clone(),
            project_path.canonicalize().unwrap_or(project_path.clone()),
        ];

        // If we can create a relative path, test that too
        if let Ok(current_dir) = std::env::current_dir()
            && let Ok(relative) = project_path.strip_prefix(&current_dir)
        {
            paths_to_test.push(relative.to_path_buf());
        }

        let mut cache_paths = Vec::new();
        for path in &paths_to_test {
            let cache_manager = CacheManager::new(path, &config);
            cache_paths.push(cache_manager.get_cache_path());
        }

        // All cache paths should be identical
        for (i, path1) in cache_paths.iter().enumerate() {
            for (j, path2) in cache_paths.iter().enumerate() {
                if i != j {
                    assert_eq!(
                        path1, path2,
                        "Cache paths should be identical for different representations of the same project path"
                    );
                }
            }
        }
    }

    #[test]
    fn test_normalize_path_format() {
        // Test Windows UNC path normalization
        if cfg!(windows) {
            let unc_path = Path::new("\\\\?\\C:\\test\\path");
            let normalized = CacheManager::normalize_path_format(unc_path);
            assert_eq!(normalized, PathBuf::from("C:\\test\\path"));
        }

        // Test normal path (should remain unchanged)
        let normal_path = Path::new("/normal/path");
        let normalized = CacheManager::normalize_path_format(normal_path);
        assert_eq!(normalized, normal_path);
    }

    #[test]
    fn test_cache_read_nonexistent_file() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("nonexistent_project");

        let config = Config::default();
        let cache_manager = CacheManager::new(&project_path, &config);

        let result = cache_manager.read_cache().unwrap();
        assert!(result.is_none());
    }

    #[test]
    fn test_cache_read_corrupted_file() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        let config = Config::default();
        let cache_manager = CacheManager::new(&project_path, &config);
        let cache_path = cache_manager.get_cache_path();

        // Create a corrupted cache file
        let _ = fs::create_dir_all(cache_path.parent().unwrap());
        let _ = fs::write(&cache_path, "invalid json content {{{");

        let result = cache_manager.read_cache();
        assert!(result.is_err());
    }

    #[test]
    fn test_cache_write_read_roundtrip() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        let config = Config {
            filter: Some(vec!["rs".to_string(), "toml".to_string()]),
            ignore: Some(vec!["target".to_string(), ".git".to_string()]),
            line_numbers: Some(true),
            ..Default::default()
        };

        let cache_manager = CacheManager::new(&project_path, &config);

        use crate::state::ProjectMetadata;
        use std::collections::BTreeMap;

        let mut files = BTreeMap::new();
        files.insert(
            PathBuf::from("test.rs"),
            crate::state::FileState {
                content: "fn main() {}".to_string(),
                size: 12,
                modified: std::time::SystemTime::UNIX_EPOCH,
                content_hash: "test_hash".to_string(),
            },
        );

        let original_state = ProjectState {
            timestamp: "2023-01-01T12:00:00Z".to_string(),
            config_hash: "test_config_hash".to_string(),
            files,
            metadata: ProjectMetadata {
                project_name: "test_project".to_string(),
                file_count: 1,
                filters: vec!["rs".to_string(), "toml".to_string()],
                ignores: vec!["target".to_string(), ".git".to_string()],
                line_numbers: true,
            },
        };

        // Write and read back
        cache_manager.write_cache(&original_state).unwrap();
        let cached_state = cache_manager.read_cache().unwrap().unwrap();

        assert_eq!(cached_state.timestamp, original_state.timestamp);
        assert_eq!(cached_state.config_hash, original_state.config_hash);
        assert_eq!(cached_state.files.len(), original_state.files.len());
        assert_eq!(
            cached_state.metadata.project_name,
            original_state.metadata.project_name
        );
        assert_eq!(
            cached_state.metadata.file_count,
            original_state.metadata.file_count
        );
        assert_eq!(
            cached_state.metadata.filters,
            original_state.metadata.filters
        );
        assert_eq!(
            cached_state.metadata.ignores,
            original_state.metadata.ignores
        );
        assert_eq!(
            cached_state.metadata.line_numbers,
            original_state.metadata.line_numbers
        );
    }

    #[test]
    fn test_different_configs_different_cache_files() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        let config1 = Config {
            filter: Some(vec!["rs".to_string()]),
            ..Default::default()
        };

        let config2 = Config {
            filter: Some(vec!["py".to_string()]),
            ..Default::default()
        };

        let cache_manager1 = CacheManager::new(&project_path, &config1);
        let cache_manager2 = CacheManager::new(&project_path, &config2);

        let cache_path1 = cache_manager1.get_cache_path();
        let cache_path2 = cache_manager2.get_cache_path();

        assert_ne!(
            cache_path1, cache_path2,
            "Different configs should have different cache files"
        );
    }

    #[test]
    fn test_normalize_project_path_absolute() {
        let temp_dir = tempdir().unwrap();
        let project_path = temp_dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        let normalized = CacheManager::normalize_project_path(&project_path);
        assert!(normalized.is_absolute());
    }

    #[test]
    fn test_normalize_project_path_relative() {
        let temp_dir = tempdir().unwrap();
        let original_dir = std::env::current_dir().unwrap();

        // Change to temp directory
        std::env::set_current_dir(&temp_dir).unwrap();

        // Create a project directory
        let project_name = "relative_project";
        let _ = fs::create_dir(project_name);

        let relative_path = Path::new(project_name);
        let normalized = CacheManager::normalize_project_path(relative_path);

        // Restore original directory
        std::env::set_current_dir(original_dir).unwrap();

        assert!(normalized.is_absolute());
        assert!(normalized.to_string_lossy().contains(project_name));
    }

    #[test]
    fn test_hash_config_same_values() {
        let config1 = Config {
            filter: Some(vec!["rs".to_string(), "toml".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(false),
            ..Default::default()
        };

        let config2 = Config {
            filter: Some(vec!["rs".to_string(), "toml".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(false),
            ..Default::default()
        };

        let hash1 = CacheManager::hash_config(&config1);
        let hash2 = CacheManager::hash_config(&config2);

        assert_eq!(
            hash1, hash2,
            "Identical configs should produce identical hashes"
        );
    }

    #[test]
    fn test_migrate_old_cache_preserves_new_files() {
        let dir = tempdir().unwrap();
        let project_path = dir.path().join("test_project");
        let _ = fs::create_dir(&project_path);

        let cache_dir = project_path.join(".context-builder").join("cache");
        let _ = fs::create_dir_all(&cache_dir);

        // Create both old and new cache files
        let _ = fs::write(cache_dir.join("last_canonical.md"), "old content");
        let _ = fs::write(cache_dir.join("state_abc123_def456.json"), "new content");

        let config = Config::default();
        let _cache_manager = CacheManager::new(&project_path, &config);

        // Old file should be removed
        assert!(!cache_dir.join("last_canonical.md").exists());

        // New file should be preserved
        assert!(cache_dir.join("state_abc123_def456.json").exists());
    }
}
```

### File: `src/cli.rs`

- Size: 4720 bytes
- Modified: 2026-02-14 19:48:15 UTC

```rust
use clap::Parser;

/// CLI tool to aggregate directory contents into a single Markdown file optimized for LLM consumption
#[derive(Parser, Debug, Clone)]
#[clap(author, version, about)]
pub struct Args {
    /// Directory path to process
    #[clap(short = 'd', long, default_value = ".")]
    pub input: String,

    /// Output file path
    #[clap(short, long, default_value = "output.md")]
    pub output: String,

    /// File extensions to include (e.g., --filter rs,toml)
    #[clap(short = 'f', long, value_delimiter = ',')]
    pub filter: Vec<String>,

    /// Folder or file names to ignore (e.g., --ignore target --ignore lock)
    #[clap(short = 'i', long)]
    pub ignore: Vec<String>,

    /// Preview mode: only print the file tree to the console, don't generate the documentation file
    #[clap(long)]
    pub preview: bool,

    /// Token count mode: estimate the total token count of the final document
    #[clap(long)]
    pub token_count: bool,

    /// Add line numbers to code blocks in the output
    #[clap(long)]
    pub line_numbers: bool,

    /// Automatically answer yes to all prompts
    #[clap(short = 'y', long)]
    pub yes: bool,

    /// Maximum token budget for the output. Files are truncated/skipped when exceeded.
    #[clap(long)]
    pub max_tokens: Option<usize>,

    /// Output only diffs (omit full file contents; requires auto-diff & timestamped output)
    #[clap(long, default_value_t = false)]
    pub diff_only: bool,

    /// Clear the cached project state and exit
    #[clap(long)]
    pub clear_cache: bool,

    /// Initialize a new context-builder.toml config file in the current directory
    #[clap(long)]
    pub init: bool,
}

#[cfg(test)]
mod tests {
    use super::Args;
    use clap::Parser;

    #[test]
    fn parses_with_no_args() {
        let res = Args::try_parse_from(["context-builder"]);
        assert!(res.is_ok(), "Expected success when no args are provided");
    }

    #[test]
    fn parses_all_flags_and_options() {
        let args = Args::try_parse_from([
            "context-builder",
            "--input",
            "some/dir",
            "--output",
            "ctx.md",
            "--filter",
            "rs",
            "--filter",
            "toml",
            "--ignore",
            "target",
            "--ignore",
            "node_modules",
            "--preview",
            "--token-count",
            "--line-numbers",
            "--diff-only",
            "--clear-cache",
        ])
        .expect("should parse");

        assert_eq!(args.input, "some/dir");
        assert_eq!(args.output, "ctx.md");
        assert_eq!(args.filter, vec!["rs".to_string(), "toml".to_string()]);
        assert_eq!(
            args.ignore,
            vec!["target".to_string(), "node_modules".to_string()]
        );
        assert!(args.preview);
        assert!(args.token_count);
        assert!(args.line_numbers);
        assert!(args.diff_only);
        assert!(args.clear_cache);
    }

    #[test]
    fn short_flags_parse_correctly() {
        let args = Args::try_parse_from([
            "context-builder",
            "-d",
            ".",
            "-o",
            "out.md",
            "-f",
            "md",
            "-f",
            "rs",
            "-i",
            "target",
            "-i",
            ".git",
        ])
        .expect("should parse");

        assert_eq!(args.input, ".");
        assert_eq!(args.output, "out.md");
        assert_eq!(args.filter, vec!["md".to_string(), "rs".to_string()]);
        assert_eq!(args.ignore, vec!["target".to_string(), ".git".to_string()]);
        assert!(!args.preview);
        assert!(!args.line_numbers);
        assert!(!args.clear_cache);
    }

    #[test]
    fn defaults_for_options_when_not_provided() {
        let args = Args::try_parse_from(["context-builder", "-d", "proj"]).expect("should parse");

        assert_eq!(args.input, "proj");
        assert_eq!(args.output, "output.md");
        assert!(args.filter.is_empty());
        assert!(args.ignore.is_empty());
        assert!(!args.preview);
        assert!(!args.line_numbers);
        assert!(!args.diff_only);
        assert!(!args.clear_cache);
    }

    #[test]
    fn parses_diff_only_flag() {
        let args = Args::try_parse_from(["context-builder", "--diff-only"])
            .expect("should parse diff-only flag");
        assert!(args.diff_only);
        assert!(!args.clear_cache);
    }

    #[test]
    fn parses_clear_cache_flag() {
        let args = Args::try_parse_from(["context-builder", "--clear-cache"])
            .expect("should parse clear-cache flag");
        assert!(args.clear_cache);
        assert!(!args.diff_only);
    }
}
```

### File: `src/config.rs`

- Size: 7686 bytes
- Modified: 2026-02-14 19:48:35 UTC

```rust
use serde::Deserialize;
use std::fs;
use std::path::Path;

/// Global configuration loaded from `context-builder.toml`.
///
/// Any field left as `None` means "use the CLI default / do not override".
/// Command-line arguments always take precedence over values provided here.
///
/// Example `context-builder.toml`:
/// ```toml
/// output = "context.md"
/// output_folder = "docs"
/// timestamped_output = true
/// auto_diff = true
/// diff_only = true         # Emit only change summary + modified file diffs (no full file bodies)
/// filter = ["rs", "toml"]
/// ignore = ["target", ".git"]
/// line_numbers = false
/// diff_context_lines = 5
/// ```
///
#[derive(Deserialize, Debug, Default, Clone)]
pub struct Config {
    /// Output file name (or base name when `timestamped_output = true`)
    pub output: Option<String>,

    /// File extensions to include (no leading dot, e.g. `rs`, `toml`)
    pub filter: Option<Vec<String>>,

    /// File / directory names to ignore (exact name matches)
    pub ignore: Option<Vec<String>>,

    /// Add line numbers to code blocks
    pub line_numbers: Option<bool>,

    /// Preview only the file tree (no file output)
    pub preview: Option<bool>,

    /// Token counting mode
    pub token_count: Option<bool>,

    /// Optional folder to place the generated output file(s) in
    pub output_folder: Option<String>,

    /// If true, append a UTC timestamp to the output file name (before extension)
    pub timestamped_output: Option<bool>,

    /// Assume "yes" for overwrite / processing confirmations
    pub yes: Option<bool>,

    /// Enable automatic diff generation (requires `timestamped_output = true`)
    pub auto_diff: Option<bool>,

    /// Override number of unified diff context lines (falls back to env or default = 3)
    pub diff_context_lines: Option<usize>,

    /// When true, emit ONLY:
    /// - Header + file tree
    /// - Change Summary
    /// - Per-file diffs for modified files
    ///
    /// Excludes full file contents section entirely. Added files appear only in the
    /// change summary (and are marked Added) but their full content is omitted.
    pub diff_only: Option<bool>,

    /// Encoding handling strategy for non-UTF-8 files.
    /// - "detect": Attempt to detect and transcode to UTF-8 (default)
    /// - "strict": Only include valid UTF-8 files, skip others
    /// - "skip": Skip all non-UTF-8 files without transcoding attempts
    pub encoding_strategy: Option<String>,

    /// Maximum token budget for the output. Files are truncated/skipped when exceeded.
    pub max_tokens: Option<usize>,
}

/// Load configuration from `context-builder.toml` in the current working directory.
/// Returns `None` if the file does not exist or cannot be parsed.
pub fn load_config() -> Option<Config> {
    let config_path = Path::new("context-builder.toml");
    if config_path.exists() {
        let content = fs::read_to_string(config_path).ok()?;
        toml::from_str(&content).ok()
    } else {
        None
    }
}

/// Load configuration from `context-builder.toml` in the specified project root directory.
/// Returns `None` if the file does not exist or cannot be parsed.
pub fn load_config_from_path(project_root: &Path) -> Option<Config> {
    let config_path = project_root.join("context-builder.toml");
    if config_path.exists() {
        let content = fs::read_to_string(config_path).ok()?;
        toml::from_str(&content).ok()
    } else {
        None
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::fs;
    use tempfile::tempdir;

    #[test]
    fn load_config_nonexistent_file() {
        // Test loading config when file doesn't exist by temporarily changing directory
        let temp_dir = tempdir().unwrap();
        let original_dir = std::env::current_dir().unwrap();

        // Change to temp directory where no config file exists
        std::env::set_current_dir(&temp_dir).unwrap();

        let result = load_config();

        // Restore original directory
        std::env::set_current_dir(original_dir).unwrap();

        assert!(result.is_none());
    }

    #[test]
    fn load_config_from_path_nonexistent_file() {
        let dir = tempdir().unwrap();
        let result = load_config_from_path(dir.path());
        assert!(result.is_none());
    }

    #[test]
    fn load_config_from_path_valid_config() {
        let dir = tempdir().unwrap();
        let config_path = dir.path().join("context-builder.toml");

        let config_content = r#"
output = "test-output.md"
filter = ["rs", "toml"]
ignore = ["target", ".git"]
line_numbers = true
preview = false
token_count = true
timestamped_output = true
yes = false
auto_diff = true
diff_context_lines = 5
diff_only = false
encoding_strategy = "detect"
"#;

        fs::write(&config_path, config_content).unwrap();

        let config = load_config_from_path(dir.path()).unwrap();
        assert_eq!(config.output.unwrap(), "test-output.md");
        assert_eq!(config.filter.unwrap(), vec!["rs", "toml"]);
        assert_eq!(config.ignore.unwrap(), vec!["target", ".git"]);
        assert!(config.line_numbers.unwrap());
        assert!(!config.preview.unwrap());
        assert!(config.token_count.unwrap());
        assert!(config.timestamped_output.unwrap());
        assert!(!config.yes.unwrap());
        assert!(config.auto_diff.unwrap());
        assert_eq!(config.diff_context_lines.unwrap(), 5);
        assert!(!config.diff_only.unwrap());
        assert_eq!(config.encoding_strategy.unwrap(), "detect");
    }

    #[test]
    fn load_config_from_path_partial_config() {
        let dir = tempdir().unwrap();
        let config_path = dir.path().join("context-builder.toml");

        let config_content = r#"
output = "minimal.md"
filter = ["py"]
"#;

        fs::write(&config_path, config_content).unwrap();

        let config = load_config_from_path(dir.path()).unwrap();
        assert_eq!(config.output.unwrap(), "minimal.md");
        assert_eq!(config.filter.unwrap(), vec!["py"]);
        assert!(config.ignore.is_none());
        assert!(config.line_numbers.is_none());
        assert!(config.auto_diff.is_none());
    }

    #[test]
    fn load_config_from_path_invalid_toml() {
        let dir = tempdir().unwrap();
        let config_path = dir.path().join("context-builder.toml");

        // Invalid TOML content
        let config_content = r#"
output = "test.md"
invalid_toml [
"#;

        fs::write(&config_path, config_content).unwrap();

        let config = load_config_from_path(dir.path());
        assert!(config.is_none());
    }

    #[test]
    fn load_config_from_path_empty_config() {
        let dir = tempdir().unwrap();
        let config_path = dir.path().join("context-builder.toml");

        fs::write(&config_path, "").unwrap();

        let config = load_config_from_path(dir.path()).unwrap();
        assert!(config.output.is_none());
        assert!(config.filter.is_none());
        assert!(config.ignore.is_none());
    }

    #[test]
    fn config_default_implementation() {
        let config = Config::default();
        assert!(config.output.is_none());
        assert!(config.filter.is_none());
        assert!(config.ignore.is_none());
        assert!(config.line_numbers.is_none());
        assert!(config.preview.is_none());
        assert!(config.token_count.is_none());
        assert!(config.output_folder.is_none());
        assert!(config.timestamped_output.is_none());
        assert!(config.yes.is_none());
        assert!(config.auto_diff.is_none());
        assert!(config.diff_context_lines.is_none());
        assert!(config.diff_only.is_none());
        assert!(config.encoding_strategy.is_none());
    }
}
```

### File: `src/config_resolver.rs`

- Size: 15339 bytes
- Modified: 2026-02-14 19:56:05 UTC

```rust
//! Configuration resolution module for context-builder.
//!
//! This module provides centralized logic for merging CLI arguments with configuration
//! file values, implementing proper precedence rules and handling complex scenarios
//! like timestamping and output folder resolution.

use chrono::Utc;
use std::path::{Path, PathBuf};

use crate::cli::Args;
use crate::config::Config;

/// Resolved configuration combining CLI arguments and config file values
#[derive(Debug, Clone)]
pub struct ResolvedConfig {
    pub input: String,
    pub output: String,
    pub filter: Vec<String>,
    pub ignore: Vec<String>,
    pub line_numbers: bool,
    pub preview: bool,
    pub token_count: bool,
    pub yes: bool,
    pub diff_only: bool,
    pub clear_cache: bool,
    pub auto_diff: bool,
    pub diff_context_lines: usize,
    pub max_tokens: Option<usize>,
    pub init: bool,
}

/// Result of configuration resolution including the final config and any warnings
#[derive(Debug)]
pub struct ConfigResolution {
    pub config: ResolvedConfig,
    pub warnings: Vec<String>,
}

/// Resolves final configuration by merging CLI arguments with config file values.
///
/// Precedence rules (highest to lowest):
/// 1. Explicit CLI arguments (non-default values)
/// 2. Configuration file values
/// 3. CLI default values
///
/// Special handling:
/// - `output` field supports timestamping and output folder resolution
/// - Boolean flags respect explicit CLI usage vs defaults
/// - Arrays (filter, ignore) use CLI if non-empty, otherwise config file
pub fn resolve_final_config(mut args: Args, config: Option<Config>) -> ConfigResolution {
    let mut warnings = Vec::new();

    // Start with CLI defaults, then apply config file, then explicit CLI overrides
    let final_config = if let Some(config) = config {
        apply_config_to_args(&mut args, &config, &mut warnings);
        resolve_output_path(&mut args, &config, &mut warnings);
        config
    } else {
        Config::default()
    };

    let resolved = ResolvedConfig {
        input: args.input,
        output: args.output,
        filter: args.filter,
        ignore: args.ignore,
        line_numbers: args.line_numbers,
        preview: args.preview,
        token_count: args.token_count,
        yes: args.yes,
        diff_only: args.diff_only,
        clear_cache: args.clear_cache,
        auto_diff: final_config.auto_diff.unwrap_or(false),
        diff_context_lines: final_config.diff_context_lines.unwrap_or(3),
        max_tokens: args.max_tokens.or(final_config.max_tokens),
        init: args.init,
    };

    ConfigResolution {
        config: resolved,
        warnings,
    }
}

/// Apply configuration file values to CLI arguments based on precedence rules
fn apply_config_to_args(args: &mut Args, config: &Config, warnings: &mut Vec<String>) {
    // Output: only apply config if CLI is using default value
    if args.output == "output.md"
        && let Some(ref output) = config.output
    {
        args.output = output.clone();
    }

    // Filter: CLI takes precedence if non-empty
    if args.filter.is_empty()
        && let Some(ref filter) = config.filter
    {
        args.filter = filter.clone();
    }

    // Ignore: CLI takes precedence if non-empty
    if args.ignore.is_empty()
        && let Some(ref ignore) = config.ignore
    {
        args.ignore = ignore.clone();
    }

    // Boolean flags: config applies only if CLI is using default (false)
    // Note: We can't distinguish between explicit --no-flag and default false,
    // so config file can only enable features, not disable them
    if !args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        args.line_numbers = line_numbers;
    }

    if !args.preview
        && let Some(preview) = config.preview
    {
        args.preview = preview;
    }

    if !args.token_count
        && let Some(token_count) = config.token_count
    {
        args.token_count = token_count;
    }

    if !args.yes
        && let Some(yes) = config.yes
    {
        args.yes = yes;
    }

    // diff_only: config can enable it, but CLI flag always takes precedence
    if !args.diff_only
        && let Some(true) = config.diff_only
    {
        args.diff_only = true;
    }

    // Validate auto_diff configuration
    if let Some(true) = config.auto_diff
        && config.timestamped_output != Some(true)
    {
        warnings.push(
            "auto_diff is enabled but timestamped_output is not enabled. \
            Auto-diff requires timestamped_output = true to function properly."
                .to_string(),
        );
    }
}

/// Resolve output path including timestamping and output folder logic
fn resolve_output_path(args: &mut Args, config: &Config, warnings: &mut Vec<String>) {
    let mut output_folder_path: Option<PathBuf> = None;

    // Apply output folder first
    if let Some(ref output_folder) = config.output_folder {
        let mut path = PathBuf::from(output_folder);
        path.push(&args.output);
        args.output = path.to_string_lossy().to_string();
        output_folder_path = Some(PathBuf::from(output_folder));
    }

    // Apply timestamping if enabled
    if let Some(true) = config.timestamped_output {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = Path::new(&args.output);

        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");

        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");

        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);

        if let Some(output_folder) = output_folder_path {
            args.output = output_folder
                .join(new_filename)
                .to_string_lossy()
                .to_string();
        } else {
            let new_path = path.with_file_name(new_filename);
            args.output = new_path.to_string_lossy().to_string();
        }
    }

    // Validate output folder exists if specified
    if let Some(ref output_folder) = config.output_folder {
        let folder_path = Path::new(output_folder);
        if !folder_path.exists() {
            warnings.push(format!(
                "Output folder '{}' does not exist. It will be created if possible.",
                output_folder
            ));
        }
    }
}

/// Check if CLI arguments have been explicitly set vs using defaults.
/// This is a best-effort detection since clap doesn't provide this information directly.
#[allow(dead_code)]
fn detect_explicit_args() -> ExplicitArgs {
    let args: Vec<String> = std::env::args().collect();

    ExplicitArgs {
        output: args.iter().any(|arg| arg == "-o" || arg == "--output"),
        filter: args.iter().any(|arg| arg == "-f" || arg == "--filter"),
        ignore: args.iter().any(|arg| arg == "-i" || arg == "--ignore"),
        line_numbers: args.iter().any(|arg| arg == "--line-numbers"),
        preview: args.iter().any(|arg| arg == "--preview"),
        token_count: args.iter().any(|arg| arg == "--token-count"),
        yes: args.iter().any(|arg| arg == "-y" || arg == "--yes"),
        diff_only: args.iter().any(|arg| arg == "--diff-only"),
    }
}

/// Tracks which CLI arguments were explicitly provided vs using defaults
#[allow(dead_code)]
struct ExplicitArgs {
    output: bool,
    filter: bool,
    ignore: bool,
    line_numbers: bool,
    preview: bool,
    token_count: bool,
    yes: bool,
    diff_only: bool,
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_config_precedence_cli_over_config() {
        let args = Args {
            input: "src".to_string(),
            output: "custom.md".to_string(), // Explicit CLI value
            filter: vec!["rs".to_string()],  // Explicit CLI value
            ignore: vec![],
            line_numbers: true, // Explicit CLI value
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let config = Config {
            output: Some("config.md".to_string()),  // Should be ignored
            filter: Some(vec!["toml".to_string()]), // Should be ignored
            line_numbers: Some(false),              // Should be ignored
            preview: Some(true),                    // Should apply
            ..Default::default()
        };

        let resolution = resolve_final_config(args.clone(), Some(config));

        assert_eq!(resolution.config.output, "custom.md"); // CLI wins
        assert_eq!(resolution.config.filter, vec!["rs"]); // CLI wins
        assert!(resolution.config.line_numbers); // CLI wins
        assert!(resolution.config.preview); // Config applies
    }

    #[test]
    fn test_config_applies_when_cli_uses_defaults() {
        let args = Args {
            input: "src".to_string(),
            output: "output.md".to_string(), // Default value
            filter: vec![],                  // Default value
            ignore: vec![],                  // Default value
            line_numbers: false,             // Default value
            preview: false,                  // Default value
            token_count: false,              // Default value
            yes: false,                      // Default value
            diff_only: false,                // Default value
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let config = Config {
            output: Some("from_config.md".to_string()),
            filter: Some(vec!["rs".to_string(), "toml".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(true),
            preview: Some(true),
            token_count: Some(true),
            yes: Some(true),
            diff_only: Some(true),
            ..Default::default()
        };

        let resolution = resolve_final_config(args, Some(config));

        assert_eq!(resolution.config.output, "from_config.md");
        assert_eq!(
            resolution.config.filter,
            vec!["rs".to_string(), "toml".to_string()]
        );
        assert_eq!(resolution.config.ignore, vec!["target".to_string()]);
        assert!(resolution.config.line_numbers);
        assert!(resolution.config.preview);
        assert!(resolution.config.token_count);
        assert!(resolution.config.yes);
        assert!(resolution.config.diff_only);
    }

    #[test]
    fn test_timestamped_output_resolution() {
        let args = Args {
            input: "src".to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let config = Config {
            timestamped_output: Some(true),
            ..Default::default()
        };

        let resolution = resolve_final_config(args, Some(config));

        // Output should have timestamp format: test_YYYYMMDDHHMMSS.md
        assert!(resolution.config.output.starts_with("test_"));
        assert!(resolution.config.output.ends_with(".md"));
        assert!(resolution.config.output.len() > "test_.md".len());
    }

    #[test]
    fn test_output_folder_resolution() {
        let args = Args {
            input: "src".to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let config = Config {
            output_folder: Some("docs".to_string()),
            ..Default::default()
        };

        let resolution = resolve_final_config(args, Some(config));

        assert!(resolution.config.output.contains("docs"));
        assert!(resolution.config.output.ends_with("test.md"));
    }

    #[test]
    fn test_output_folder_with_timestamping() {
        let args = Args {
            input: "src".to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let config = Config {
            output_folder: Some("docs".to_string()),
            timestamped_output: Some(true),
            ..Default::default()
        };

        let resolution = resolve_final_config(args, Some(config));

        assert!(resolution.config.output.contains("docs"));
        assert!(resolution.config.output.contains("test_"));
        assert!(resolution.config.output.ends_with(".md"));
    }

    #[test]
    fn test_auto_diff_without_timestamping_warning() {
        let args = Args {
            input: "src".to_string(),
            output: "test.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let config = Config {
            auto_diff: Some(true),
            timestamped_output: Some(false), // This should generate a warning
            ..Default::default()
        };

        let resolution = resolve_final_config(args, Some(config));

        assert!(!resolution.warnings.is_empty());
        assert!(resolution.warnings[0].contains("auto_diff"));
        assert!(resolution.warnings[0].contains("timestamped_output"));
    }

    #[test]
    fn test_no_config_uses_cli_defaults() {
        let args = Args {
            input: "src".to_string(),
            output: "output.md".to_string(),
            filter: vec![],
            ignore: vec![],
            line_numbers: false,
            preview: false,
            token_count: false,
            yes: false,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let resolution = resolve_final_config(args.clone(), None);

        assert_eq!(resolution.config.input, args.input);
        assert_eq!(resolution.config.output, args.output);
        assert_eq!(resolution.config.filter, args.filter);
        assert_eq!(resolution.config.ignore, args.ignore);
        assert_eq!(resolution.config.line_numbers, args.line_numbers);
        assert_eq!(resolution.config.preview, args.preview);
        assert_eq!(resolution.config.token_count, args.token_count);
        assert_eq!(resolution.config.yes, args.yes);
        assert_eq!(resolution.config.diff_only, args.diff_only);
        assert!(!resolution.config.auto_diff);
        assert_eq!(resolution.config.diff_context_lines, 3);
        assert!(resolution.warnings.is_empty());
    }
}
```

### File: `src/diff.rs`

- Size: 20099 bytes
- Modified: 2026-02-14 07:14:48 UTC

```rust
use similar::{ChangeTag, TextDiff};
use std::collections::HashMap;

/// Line based diff utilities.
///
/// This module previously exposed `generate_diff` which produced a single
/// "## File Differences" section for an entire markdown document. That
/// approach made it easy for volatile sections (timestamps, file tree
/// structure, etc.) to create noisy diffs. To address this the new
/// per‑file API lets the caller diff only the normalized *file content*
/// blocks that appear under each `### File: `path`` heading in the
/// canonical output, completely ignoring the global header or the file
/// tree portion. Each file receives an isolated unified style diff.
///
/// High level additions:
/// * `PerFileStatus` – classification of the change.
/// * `PerFileDiff` – structured diff result for a single file.
/// * `diff_file_contents` – core engine producing diffs per file without any
///   global "## File Differences" header.
/// * `render_per_file_diffs` – helper to render the per file diffs into
///   markdown (still omits a global header so the caller can choose).
///
/// Backwards compatibility: the existing `generate_diff` function (full
/// document diff) is retained for now. New code should prefer the
/// per‑file functions.
/// Determine number of context lines either from explicit argument or env.
fn resolve_context_lines(explicit: Option<usize>) -> usize {
    explicit
        .filter(|v| *v > 0)
        .or_else(|| {
            std::env::var("CB_DIFF_CONTEXT_LINES")
                .ok()
                .and_then(|v| v.parse().ok())
                .filter(|v: &usize| *v > 0)
        })
        .unwrap_or(3)
}

/// Original API: produce a single markdown section headed by "## File Differences".
/// (Kept unchanged for compatibility.)
pub fn generate_diff(old_content: &str, new_content: &str) -> String {
    let diff = TextDiff::from_lines(old_content, new_content);
    if diff.ratio() == 1.0 {
        return String::new();
    }
    let context_lines = resolve_context_lines(None);
    let grouped = diff.grouped_ops(context_lines);
    let mut out = String::new();
    out.push_str("## File Differences\n\n");
    out.push_str("```diff\n");
    for (group_index, group) in grouped.iter().enumerate() {
        if group_index > 0 {
            out.push_str("  ...\n");
        }
        for op in group {
            for change in diff.iter_changes(op) {
                let tag = change.tag();
                let mut line = change.to_string();
                if line.ends_with('\n') {
                    line.pop();
                    if line.ends_with('\r') {
                        line.pop();
                    }
                }

                match tag {
                    ChangeTag::Delete => {
                        out.push_str("- ");
                        out.push_str(&line);
                        out.push('\n');
                    }
                    ChangeTag::Insert => {
                        out.push_str("+ ");
                        out.push_str(&line);
                        out.push('\n');
                    }
                    ChangeTag::Equal => {
                        out.push_str("  ");
                        out.push_str(&line);
                        out.push('\n');
                    }
                }
            }
        }
    }
    out.push_str("```\n\n");
    out
}

/// Classification of how a file changed between two snapshots.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum PerFileStatus {
    Added,
    Removed,
    Modified,
    Unchanged,
}

/// Structured diff result for a single file.
#[derive(Debug, Clone)]
pub struct PerFileDiff {
    pub path: String,
    pub status: PerFileStatus,
    /// Unified diff fenced in ```diff (omitted when status == Unchanged and skip_unchanged=true)
    pub diff: String,
}

impl PerFileDiff {
    pub fn is_changed(&self) -> bool {
        self.status != PerFileStatus::Unchanged
    }
}

/// Produce a unified style diff for two text blobs WITHOUT adding any global
/// section header. Returns empty string if contents are identical.
fn unified_no_header(old: &str, new: &str, context_lines: usize) -> String {
    let diff = TextDiff::from_lines(old, new);
    if diff.ratio() == 1.0 {
        return String::new();
    }
    let grouped = diff.grouped_ops(context_lines);
    let mut out = String::new();
    out.push_str("```diff\n");
    for (group_index, group) in grouped.iter().enumerate() {
        if group_index > 0 {
            out.push_str("  ...\n");
        }
        for op in group {
            for change in diff.iter_changes(op) {
                let tag = change.tag();
                let mut line = change.to_string();
                if line.ends_with('\n') {
                    line.pop();
                    if line.ends_with('\r') {
                        line.pop();
                    }
                }

                match tag {
                    ChangeTag::Delete => {
                        out.push_str("- ");
                        out.push_str(&line);
                        out.push('\n');
                    }
                    ChangeTag::Insert => {
                        out.push_str("+ ");
                        out.push_str(&line);
                        out.push('\n');
                    }
                    ChangeTag::Equal => {
                        out.push_str("  ");
                        out.push_str(&line);
                        out.push('\n');
                    }
                }
            }
        }
    }
    out.push_str("```\n");
    out
}

/// Diff per file content sets.
///
/// Inputs are maps keyed by file path (relative or absolute – caller decides)
/// with values being the raw file content EXACTLY as you wish it to be diffed
/// (e.g. already stripped of volatile metadata, no size/modified lines, only
/// the real file body). This keeps higher level logic (parsing the markdown
/// document) out of the diff layer.
///
/// Returns a vector of `PerFileDiff` for every file that is Added, Removed,
/// or Modified. Unchanged files are omitted by default (`skip_unchanged=true`)
/// to reduce noise, but you can opt to include them.
pub fn diff_file_contents(
    previous: &HashMap<String, String>,
    current: &HashMap<String, String>,
    skip_unchanged: bool,
    explicit_context: Option<usize>,
) -> Vec<PerFileDiff> {
    let mut all_paths: Vec<String> = previous.keys().chain(current.keys()).cloned().collect();
    all_paths.sort();
    all_paths.dedup();

    let context_lines = resolve_context_lines(explicit_context);
    let mut results = Vec::new();

    for path in all_paths {
        let old_opt = previous.get(&path);
        let new_opt = current.get(&path);
        match (old_opt, new_opt) {
            (None, Some(new_content)) => {
                // Added file: present only in current snapshot
                let mut diff = String::new();
                diff.push_str("```diff\n");
                for line in new_content.lines() {
                    diff.push_str("+ ");
                    diff.push_str(line);
                    diff.push('\n');
                }
                diff.push_str("```\n");
                results.push(PerFileDiff {
                    path,
                    status: PerFileStatus::Added,
                    diff,
                });
            }
            (Some(_old_content), None) => {
                // Removed file
                let old_content = previous.get(&path).unwrap();
                let mut diff = String::new();
                diff.push_str("```diff\n");
                for line in old_content.lines() {
                    diff.push_str("- ");
                    diff.push_str(line);
                    diff.push('\n');
                }
                diff.push_str("```\n");
                results.push(PerFileDiff {
                    path,
                    status: PerFileStatus::Removed,
                    diff,
                });
            }
            (Some(old_content), Some(new_content)) => {
                if old_content == new_content {
                    if !skip_unchanged {
                        results.push(PerFileDiff {
                            path,
                            status: PerFileStatus::Unchanged,
                            diff: String::new(),
                        });
                    }
                } else {
                    let diff = unified_no_header(old_content, new_content, context_lines);
                    results.push(PerFileDiff {
                        path,
                        status: PerFileStatus::Modified,
                        diff,
                    });
                }
            }
            (None, None) => unreachable!(),
        }
    }

    results
}

/// Render a collection of per file diffs into markdown WITHOUT a global
/// "## File Differences" header. Each file begins with a "### Diff: `<path>`"
/// heading so that it can be appended near the changed files summary.
pub fn render_per_file_diffs(diffs: &[PerFileDiff]) -> String {
    let mut out = String::new();
    for d in diffs {
        out.push_str(&format!("### Diff: `{}`\n\n", d.path));
        match d.status {
            PerFileStatus::Added => out.push_str("_Status: Added_\n\n"),
            PerFileStatus::Removed => out.push_str("_Status: Removed_\n\n"),
            PerFileStatus::Modified => out.push_str("_Status: Modified_\n\n"),
            PerFileStatus::Unchanged => {
                out.push_str("_Status: Unchanged_\n\n");
            }
        }
        if !d.diff.is_empty() {
            out.push_str(&d.diff);
            if !d.diff.ends_with('\n') {
                out.push('\n');
            }
        }
        out.push('\n');
    }
    out
}

#[cfg(test)]
mod tests {
    use super::*;

    fn map(pairs: &[(&str, &str)]) -> HashMap<String, String> {
        pairs
            .iter()
            .map(|(k, v)| (k.to_string(), v.to_string()))
            .collect()
    }

    #[test]
    fn unchanged_is_skipped() {
        let prev = map(&[("a.txt", "one\n")]);
        let curr = map(&[("a.txt", "one\n")]);
        let diffs = diff_file_contents(&prev, &curr, true, Some(2));
        assert!(diffs.is_empty());
    }

    #[test]
    fn added_file_diff() {
        let prev = map(&[]);
        let curr = map(&[("new.rs", "fn main() {}\n")]);
        let diffs = diff_file_contents(&prev, &curr, true, Some(2));
        assert_eq!(diffs.len(), 1);
        let d = &diffs[0];
        assert_eq!(d.status, PerFileStatus::Added);
        assert!(d.diff.contains("+ fn main() {}"));
    }

    #[test]
    fn removed_file_diff() {
        let prev = map(&[("old.rs", "fn old() {}\n")]);
        let curr = map(&[]);
        let diffs = diff_file_contents(&prev, &curr, true, None);
        assert_eq!(diffs.len(), 1);
        let d = &diffs[0];
        assert_eq!(d.status, PerFileStatus::Removed);
        assert!(d.diff.contains("- fn old() {}"));
    }

    #[test]
    fn modified_file_diff() {
        let prev = map(&[("lib.rs", "fn add(a:i32,b:i32)->i32{a+b}\n")]);
        let curr = map(&[("lib.rs", "fn add(a: i32, b: i32) -> i32 { a + b }\n")]);
        let diffs = diff_file_contents(&prev, &curr, true, Some(1));
        assert_eq!(diffs.len(), 1);
        let d = &diffs[0];
        assert_eq!(d.status, PerFileStatus::Modified);
        assert!(d.diff.contains("- fn add(a:i32,b:i32)->i32{a+b}"));
        assert!(d.diff.contains("+ fn add(a: i32, b: i32) -> i32 { a + b }"));
    }

    #[test]
    fn include_unchanged_when_requested() {
        let prev = map(&[("a.txt", "same\n")]);
        let curr = map(&[("a.txt", "same\n")]);
        let diffs = diff_file_contents(&prev, &curr, false, None);
        assert_eq!(diffs.len(), 1);
        assert_eq!(diffs[0].status, PerFileStatus::Unchanged);
    }

    #[test]
    fn render_output_basic() {
        let prev = map(&[("a.txt", "one\n"), ("b.txt", "line1\nline2\n")]);
        let curr = map(&[
            ("a.txt", "two\n"),
            ("b.txt", "line1\nline2\n"),
            ("c.txt", "new file\n"),
        ]);
        let diffs = diff_file_contents(&prev, &curr, true, Some(1));
        let out = render_per_file_diffs(&diffs);
        assert!(out.contains("### Diff: `a.txt`"));
        assert!(out.contains("_Status: Modified_"));
        assert!(out.contains("+ two"));
        assert!(out.contains("### Diff: `c.txt`"));
        assert!(out.contains("_Status: Added_"));
        assert!(out.contains("+ new file"));
    }

    #[test]
    fn test_empty_files() {
        let prev = map(&[("empty.txt", "")]);
        let curr = map(&[("empty.txt", "")]);
        let diffs = diff_file_contents(&prev, &curr, true, None);
        assert!(diffs.is_empty());
    }

    #[test]
    fn test_empty_to_content() {
        let prev = map(&[("file.txt", "")]);
        let curr = map(&[("file.txt", "new content\n")]);
        let diffs = diff_file_contents(&prev, &curr, true, None);
        assert_eq!(diffs.len(), 1);
        assert_eq!(diffs[0].status, PerFileStatus::Modified);
        assert!(diffs[0].diff.contains("+ new content"));
    }

    #[test]
    fn test_content_to_empty() {
        let prev = map(&[("file.txt", "old content\n")]);
        let curr = map(&[("file.txt", "")]);
        let diffs = diff_file_contents(&prev, &curr, true, None);
        assert_eq!(diffs.len(), 1);
        assert_eq!(diffs[0].status, PerFileStatus::Modified);
        assert!(diffs[0].diff.contains("- old content"));
    }

    #[test]
    fn test_multiline_modifications() {
        let prev = map(&[("file.txt", "line1\nline2\nline3\nline4\n")]);
        let curr = map(&[("file.txt", "line1\nmodified2\nline3\nline4\n")]);
        let diffs = diff_file_contents(&prev, &curr, true, Some(2));
        assert_eq!(diffs.len(), 1);
        assert_eq!(diffs[0].status, PerFileStatus::Modified);
        assert!(diffs[0].diff.contains("- line2"));
        assert!(diffs[0].diff.contains("+ modified2"));
    }

    #[test]
    fn test_windows_line_endings() {
        let prev = map(&[("file.txt", "line1\r\nline2\r\n")]);
        let curr = map(&[("file.txt", "line1\r\nmodified2\r\n")]);
        let diffs = diff_file_contents(&prev, &curr, true, None);
        assert_eq!(diffs.len(), 1);
        assert_eq!(diffs[0].status, PerFileStatus::Modified);
        assert!(diffs[0].diff.contains("- line2"));
        assert!(diffs[0].diff.contains("+ modified2"));
    }

    #[test]
    fn test_per_file_diff_is_changed() {
        let added = PerFileDiff {
            path: "test.txt".to_string(),
            status: PerFileStatus::Added,
            diff: "test".to_string(),
        };
        assert!(added.is_changed());

        let removed = PerFileDiff {
            path: "test.txt".to_string(),
            status: PerFileStatus::Removed,
            diff: "test".to_string(),
        };
        assert!(removed.is_changed());

        let modified = PerFileDiff {
            path: "test.txt".to_string(),
            status: PerFileStatus::Modified,
            diff: "test".to_string(),
        };
        assert!(modified.is_changed());

        let unchanged = PerFileDiff {
            path: "test.txt".to_string(),
            status: PerFileStatus::Unchanged,
            diff: String::new(),
        };
        assert!(!unchanged.is_changed());
    }

    #[test]
    fn test_generate_diff_identical_content() {
        let content = "line1\nline2\nline3\n";
        let diff = generate_diff(content, content);
        assert!(diff.is_empty());
    }

    #[test]
    fn test_generate_diff_with_changes() {
        let old = "line1\nline2\nline3\n";
        let new = "line1\nmodified2\nline3\n";
        let diff = generate_diff(old, new);
        assert!(diff.contains("## File Differences"));
        assert!(diff.contains("```diff"));
        assert!(diff.contains("- line2"));
        assert!(diff.contains("+ modified2"));
    }

    #[test]
    fn test_resolve_context_lines_default() {
        let context = resolve_context_lines(None);
        assert_eq!(context, 3);
    }

    #[test]
    fn test_resolve_context_lines_explicit() {
        let context = resolve_context_lines(Some(5));
        assert_eq!(context, 5);
    }

    #[test]
    fn test_resolve_context_lines_zero_fallback() {
        let context = resolve_context_lines(Some(0));
        assert_eq!(context, 3); // Should fallback to default
    }

    #[test]
    fn test_unicode_content_diff() {
        let prev = map(&[("unicode.txt", "Hello 世界\n")]);
        let curr = map(&[("unicode.txt", "Hello 世界! 🌍\n")]);
        let diffs = diff_file_contents(&prev, &curr, true, None);
        assert_eq!(diffs.len(), 1);
        assert_eq!(diffs[0].status, PerFileStatus::Modified);
        assert!(diffs[0].diff.contains("Hello 世界"));
        assert!(diffs[0].diff.contains("🌍"));
    }

    #[test]
    fn test_render_per_file_diffs_empty() {
        let diffs = vec![];
        let output = render_per_file_diffs(&diffs);
        assert!(output.is_empty());
    }

    #[test]
    fn test_render_per_file_diffs_unchanged() {
        let diffs = vec![PerFileDiff {
            path: "unchanged.txt".to_string(),
            status: PerFileStatus::Unchanged,
            diff: String::new(),
        }];
        let output = render_per_file_diffs(&diffs);
        assert!(output.contains("### Diff: `unchanged.txt`"));
        assert!(output.contains("_Status: Unchanged_"));
    }

    #[test]
    fn test_render_per_file_diffs_without_trailing_newline() {
        let diffs = vec![PerFileDiff {
            path: "test.txt".to_string(),
            status: PerFileStatus::Modified,
            diff: "```diff\n+ line\n```".to_string(), // No trailing newline
        }];
        let output = render_per_file_diffs(&diffs);
        assert!(output.contains("### Diff: `test.txt`"));
        assert!(output.contains("_Status: Modified_"));
        assert!(output.ends_with("\n\n")); // Should add newlines
    }

    #[test]
    fn test_generate_diff_with_multiple_groups() {
        // Create content that will result in multiple diff groups to trigger "..." separator
        let old_content = "line1\nline2\nline3\nline4\nline5\nline6\nline7\nline8\nline9\nline10";
        let new_content = "line1_modified\nline2\nline3\nline4\nline5\nline6\nline7\nline8\nline9_modified\nline10";

        let diff = generate_diff(old_content, new_content);
        assert!(diff.contains("```diff"));
        assert!(diff.contains("## File Differences"));
        // With sufficient distance between changes and small context, should create groups with "..." separator
        println!("Generated diff: {}", diff);
    }

    #[test]
    fn test_diff_with_windows_line_endings() {
        let old_content = "line1\r\nline2\r\n";
        let new_content = "line1_modified\r\nline2\r\n";

        let diff = generate_diff(old_content, new_content);
        assert!(diff.contains("```diff"));
        assert!(diff.contains("line1_modified"));
        assert!(!diff.is_empty());
    }

    #[test]
    fn test_unified_no_header_with_multiple_groups() {
        // Create content that will result in multiple diff groups
        let old_content = "start\n\n\n\n\n\n\n\n\n\nmiddle\n\n\n\n\n\n\n\n\n\nend";
        let new_content =
            "start_modified\n\n\n\n\n\n\n\n\n\nmiddle\n\n\n\n\n\n\n\n\n\nend_modified";

        let diff = unified_no_header(old_content, new_content, 2);
        assert!(diff.contains("```diff"));
        // Should contain "..." separator between groups when changes are far apart
        println!("Unified diff: {}", diff);
    }

    #[test]
    fn test_unified_no_header_with_windows_line_endings() {
        let old_content = "line1\r\nline2\r\n";
        let new_content = "line1_modified\r\nline2\r\n";

        let diff = unified_no_header(old_content, new_content, 3);
        assert!(diff.contains("```diff"));
        assert!(diff.contains("line1_modified"));
        assert!(!diff.is_empty());
    }
}
```

### File: `src/file_utils.rs`

- Size: 22914 bytes
- Modified: 2026-02-15 01:55:14 UTC

```rust
use ignore::{DirEntry, WalkBuilder, overrides::OverrideBuilder};
use std::fs;
use std::io::{self, Write};
use std::path::{Path, PathBuf};

/// Returns a numeric category for file relevance ordering.
/// Lower numbers appear first in output. Categories:
/// 0 = Project config + key docs (Cargo.toml, README.md, AGENTS.md, etc.)
/// 1 = Source code (src/, lib/) — entry points sorted first within category
/// 2 = Tests and benchmarks (tests/, benches/, test/, spec/)
/// 3 = Documentation, scripts, and everything else
/// 4 = Generated/lock files (Cargo.lock, package-lock.json, etc.)
/// 5 = Build/CI infrastructure (.github/, .circleci/, Dockerfile, etc.)
fn file_relevance_category(path: &Path, base_path: &Path) -> u8 {
    let relative = path.strip_prefix(base_path).unwrap_or(path);
    let rel_str = relative.to_string_lossy();

    // Check filename for lockfiles first — these are lowest priority
    if let Some(name) = relative.file_name().and_then(|n| n.to_str()) {
        let lockfile_names = [
            "Cargo.lock",
            "package-lock.json",
            "yarn.lock",
            "pnpm-lock.yaml",
            "Gemfile.lock",
            "poetry.lock",
            "composer.lock",
            "go.sum",
            "bun.lockb",
            "flake.lock",
        ];
        if lockfile_names.contains(&name) {
            return 5;
        }

        // Check for config/manifest files + key project docs — highest priority
        let config_names = [
            // Package manifests
            "Cargo.toml",
            "package.json",
            "tsconfig.json",
            "pyproject.toml",
            "setup.py",
            "setup.cfg",
            "go.mod",
            "Gemfile",
            // Tool config
            "context-builder.toml",
            ".gitignore",
            // Key project documentation (LLMs need these for context)
            "README.md",
            "README",
            "README.txt",
            "README.rst",
            "AGENTS.md",
            "CLAUDE.md",
            "GEMINI.md",
            "COPILOT.md",
            "CONTRIBUTING.md",
            "CHANGELOG.md",
        ];
        if config_names.contains(&name) {
            return 0;
        }
    }

    // Check path prefix for category
    let first_component = relative
        .components()
        .next()
        .and_then(|c| c.as_os_str().to_str())
        .unwrap_or("");

    match first_component {
        "src" | "lib" | "crates" | "packages" | "internal" | "cmd" | "pkg" => {
            // Check sub-components for test directories within source trees.
            // e.g., src/tests/auth.rs should be cat 2 (tests), not cat 1 (source).
            let sub_path = rel_str.as_ref();
            if sub_path.contains("/tests/")
                || sub_path.contains("/test/")
                || sub_path.contains("/spec/")
                || sub_path.contains("/__tests__/")
                || sub_path.contains("/benches/")
                || sub_path.contains("/benchmarks/")
            {
                2
            } else {
                1
            }
        }
        "tests" | "test" | "spec" | "benches" | "benchmarks" | "__tests__" => 2,
        "docs" | "doc" | "examples" | "scripts" | "tools" | "assets" => 3,
        // Build/CI infrastructure — useful context but not core source
        ".github" | ".circleci" | ".gitlab" | ".buildkite" => 4,
        _ => {
            // Check extensions for additional heuristics
            if let Some(ext) = relative.extension().and_then(|e| e.to_str()) {
                match ext {
                    "rs" | "go" | "py" | "ts" | "js" | "java" | "c" | "cpp" | "h" | "hpp"
                    | "rb" | "swift" | "kt" | "scala" | "ex" | "exs" | "zig" | "hs" => {
                        // Source file not in a recognized dir — check if it's a test
                        // Use path boundaries to avoid false positives (e.g., "contest.rs")
                        if rel_str.contains("/test/")
                            || rel_str.contains("/tests/")
                            || rel_str.contains("/spec/")
                            || rel_str.contains("/__tests__/")
                            || rel_str.ends_with("_test.rs")
                            || rel_str.ends_with("_test.go")
                            || rel_str.ends_with("_spec.rb")
                            || rel_str.ends_with(".test.ts")
                            || rel_str.ends_with(".test.js")
                            || rel_str.ends_with(".spec.ts")
                            || rel_str.starts_with("test_")
                        {
                            2
                        } else {
                            1
                        }
                    }
                    "md" | "txt" | "rst" | "adoc" => 3,
                    _ => 1, // Unknown extension in root — treat as source
                }
            } else {
                // Check for build-related root files without extensions
                if let Some(
                    "Makefile" | "CMakeLists.txt" | "Dockerfile" | "Containerfile" | "Justfile"
                    | "Taskfile" | "Rakefile" | "Vagrantfile",
                ) = relative.file_name().and_then(|n| n.to_str())
                {
                    4
                } else {
                    3 // No extension — docs/other
                }
            }
        }
    }
}

/// Returns a sub-priority for sorting within the same relevance category.
/// Lower values appear first. Entry points (main, lib, mod) get priority 0,
/// other files get priority 1. This ensures LLMs see architectural entry
/// points before helper modules.
fn file_entry_point_priority(path: &Path) -> u8 {
    if let Some("main" | "lib" | "mod" | "index" | "app" | "__init__") =
        path.file_stem().and_then(|s| s.to_str())
    {
        0
    } else {
        1
    }
}

/// Collects all files to be processed using `ignore` crate for efficient traversal.
///
/// `auto_ignores` are runtime-computed exclusion patterns (e.g., the tool's own
/// output file or cache directory). They are processed identically to user ignores
/// but kept separate to avoid polluting user-facing configuration.
pub fn collect_files(
    base_path: &Path,
    filters: &[String],
    ignores: &[String],
    auto_ignores: &[String],
) -> io::Result<Vec<DirEntry>> {
    let mut walker = WalkBuilder::new(base_path);
    // By default, the "ignore" crate respects .gitignore and hidden files, so we don't need walker.hidden(false)

    // Build overrides for custom ignore patterns
    let mut override_builder = OverrideBuilder::new(base_path);
    for pattern in ignores {
        // Attention: Confusing pattern ahead!
        // Add the pattern to the override builder with ! prefix to ignore matching files.
        // In OverrideBuilder, patterns without ! are whitelist (include) patterns,
        // while patterns with ! are ignore patterns.
        let ignore_pattern = format!("!{}", pattern);
        if let Err(e) = override_builder.add(&ignore_pattern) {
            return Err(io::Error::new(
                io::ErrorKind::InvalidInput,
                format!("Invalid ignore pattern '{}': {}", pattern, e),
            ));
        }
    }
    // Apply auto-computed ignore patterns (output file, cache dir, etc.)
    for pattern in auto_ignores {
        let ignore_pattern = format!("!{}", pattern);
        if let Err(e) = override_builder.add(&ignore_pattern) {
            log::warn!("Skipping invalid auto-ignore pattern '{}': {}", pattern, e);
        }
    }
    // Also, always ignore the config file itself
    if let Err(e) = override_builder.add("!context-builder.toml") {
        return Err(io::Error::new(
            io::ErrorKind::InvalidInput,
            format!("Failed to add config ignore: {}", e),
        ));
    }

    // Hardcoded auto-ignores for common heavy directories that should NEVER be
    // included, even when there's no .git directory (so .gitignore isn't read).
    // Without these, projects missing .git can produce million-line outputs
    // from dependency trees.
    let default_ignores = [
        "node_modules",
        "__pycache__",
        ".venv",
        "venv",
        ".tox",
        ".mypy_cache",
        ".pytest_cache",
        ".ruff_cache",
        "vendor",  // Go, PHP, Ruby
        ".bundle", // Ruby
        "bower_components",
        ".next",       // Next.js build output
        ".nuxt",       // Nuxt build output
        ".svelte-kit", // SvelteKit build output
        ".angular",    // Angular cache
        "dist",        // Common build output
        "build",       // Common build output
        ".gradle",     // Gradle cache
        ".cargo",      // Cargo registry cache
    ];
    for dir in &default_ignores {
        let pattern = format!("!{}/**", dir);
        if let Err(e) = override_builder.add(&pattern) {
            log::warn!("Skipping invalid default-ignore '{}': {}", dir, e);
        }
    }

    let overrides = override_builder.build().map_err(|e| {
        io::Error::new(
            io::ErrorKind::InvalidInput,
            format!("Failed to build overrides: {}", e),
        )
    })?;
    walker.overrides(overrides);

    if !filters.is_empty() {
        let mut type_builder = ignore::types::TypesBuilder::new();
        type_builder.add_defaults();
        for filter in filters {
            let _ = type_builder.add(filter, &format!("*.{}", filter));
            type_builder.select(filter);
        }
        let types = type_builder.build().unwrap();
        walker.types(types);
    }

    let mut files: Vec<DirEntry> = walker
        .build()
        .filter_map(Result::ok)
        .filter(|e| e.file_type().is_some_and(|ft| ft.is_file()))
        .collect();

    // Sort files by relevance category, then entry-point priority, then alphabetically.
    // This puts config + docs first, then source code (entry points before helpers),
    // then tests, then docs/other, then build/CI, then lockfiles.
    // LLMs comprehend codebases better when core source appears before test scaffolding.
    files.sort_by(|a, b| {
        let cat_a = file_relevance_category(a.path(), base_path);
        let cat_b = file_relevance_category(b.path(), base_path);
        cat_a
            .cmp(&cat_b)
            .then_with(|| {
                file_entry_point_priority(a.path()).cmp(&file_entry_point_priority(b.path()))
            })
            .then_with(|| a.path().cmp(b.path()))
    });

    Ok(files)
}

/// Asks for user confirmation if the number of files is large.
pub fn confirm_processing(file_count: usize) -> io::Result<bool> {
    if file_count > 100 {
        print!(
            "Warning: You're about to process {} files. This might take a while. Continue? [y/N] ",
            file_count
        );
        io::stdout().flush()?;
        let mut input = String::new();
        io::stdin().read_line(&mut input)?;
        if !input.trim().eq_ignore_ascii_case("y") {
            return Ok(false);
        }
    }
    Ok(true)
}

/// Asks for user confirmation to overwrite an existing file.
pub fn confirm_overwrite(file_path: &str) -> io::Result<bool> {
    print!("The file '{}' already exists. Overwrite? [y/N] ", file_path);
    io::stdout().flush()?;
    let mut input = String::new();
    io::stdin().read_line(&mut input)?;

    if input.trim().eq_ignore_ascii_case("y") {
        Ok(true)
    } else {
        Ok(false)
    }
}

pub fn find_latest_file(dir: &Path) -> io::Result<Option<PathBuf>> {
    if !dir.is_dir() {
        return Ok(None);
    }

    let mut latest_file = None;
    let mut latest_time = std::time::SystemTime::UNIX_EPOCH;

    for entry in fs::read_dir(dir)? {
        let entry = entry?;
        let path = entry.path();
        if path.is_file() {
            let metadata = fs::metadata(&path)?;
            let modified = metadata.modified()?;
            if modified > latest_time {
                latest_time = modified;
                latest_file = Some(path);
            }
        }
    }

    Ok(latest_file)
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::fs;
    use std::path::Path;
    use tempfile::tempdir;

    fn to_rel_paths(mut entries: Vec<DirEntry>, base: &Path) -> Vec<String> {
        entries.sort_by_key(|e| e.path().to_path_buf());
        entries
            .iter()
            .map(|e| {
                e.path()
                    .strip_prefix(base)
                    .unwrap()
                    .to_string_lossy()
                    .replace('\\', "/")
            })
            .collect()
    }

    #[test]
    fn collect_files_respects_filters() {
        let dir = tempdir().unwrap();
        let base = dir.path();

        // create files
        fs::create_dir_all(base.join("src")).unwrap();
        fs::create_dir_all(base.join("scripts")).unwrap();
        fs::write(base.join("src").join("main.rs"), "fn main() {}").unwrap();
        fs::write(base.join("Cargo.toml"), "[package]\nname=\"x\"").unwrap();
        fs::write(base.join("README.md"), "# readme").unwrap();
        fs::write(base.join("scripts").join("build.sh"), "#!/bin/sh\n").unwrap();

        let filters = vec!["rs".to_string(), "toml".to_string()];
        let ignores: Vec<String> = vec![];

        let files = collect_files(base, &filters, &ignores, &[]).unwrap();
        let relative_paths = to_rel_paths(files, base);

        assert!(relative_paths.contains(&"src/main.rs".to_string()));
        assert!(relative_paths.contains(&"Cargo.toml".to_string()));
        assert!(!relative_paths.contains(&"README.md".to_string()));
        assert!(!relative_paths.contains(&"scripts/build.sh".to_string()));
    }

    #[test]
    fn collect_files_respects_ignores_for_dirs_and_files() {
        let dir = tempdir().unwrap();
        let base = dir.path();

        fs::create_dir_all(base.join("src")).unwrap();
        fs::create_dir_all(base.join("target")).unwrap();
        fs::create_dir_all(base.join("node_modules")).unwrap();

        fs::write(base.join("src").join("main.rs"), "fn main() {}").unwrap();
        fs::write(base.join("target").join("artifact.txt"), "bin").unwrap();
        fs::write(base.join("node_modules").join("pkg.js"), "console.log();").unwrap();
        fs::write(base.join("README.md"), "# readme").unwrap();

        let filters: Vec<String> = vec![];
        let ignores: Vec<String> = vec!["target".into(), "node_modules".into(), "README.md".into()];

        let files = collect_files(base, &filters, &ignores, &[]).unwrap();
        let relative_paths = to_rel_paths(files, base);

        assert!(relative_paths.contains(&"src/main.rs".to_string()));
        assert!(!relative_paths.contains(&"target/artifact.txt".to_string()));
        assert!(!relative_paths.contains(&"node_modules/pkg.js".to_string()));
        assert!(!relative_paths.contains(&"README.md".to_string()));
    }

    #[test]
    fn collect_files_handles_invalid_ignore_pattern() {
        let dir = tempdir().unwrap();
        let base = dir.path();

        fs::create_dir_all(base.join("src")).unwrap();
        fs::write(base.join("src").join("main.rs"), "fn main() {}").unwrap();

        let filters: Vec<String> = vec![];
        let ignores: Vec<String> = vec!["[".into()]; // Invalid regex pattern

        let result = collect_files(base, &filters, &ignores, &[]);
        assert!(result.is_err());
        assert!(
            result
                .unwrap_err()
                .to_string()
                .contains("Invalid ignore pattern")
        );
    }

    #[test]
    fn collect_files_empty_directory() {
        let dir = tempdir().unwrap();
        let base = dir.path();

        let filters: Vec<String> = vec![];
        let ignores: Vec<String> = vec![];

        let files = collect_files(base, &filters, &ignores, &[]).unwrap();
        assert!(files.is_empty());
    }

    #[test]
    fn collect_files_no_matching_filters() {
        let dir = tempdir().unwrap();
        let base = dir.path();

        fs::write(base.join("README.md"), "# readme").unwrap();
        fs::write(base.join("script.py"), "print('hello')").unwrap();

        let filters = vec!["rs".to_string()]; // Only Rust files
        let ignores: Vec<String> = vec![];

        let files = collect_files(base, &filters, &ignores, &[]).unwrap();
        assert!(files.is_empty());
    }

    #[test]
    fn collect_files_ignores_config_file() {
        let dir = tempdir().unwrap();
        let base = dir.path();

        fs::write(base.join("context-builder.toml"), "[config]").unwrap();
        fs::write(base.join("other.toml"), "[other]").unwrap();

        let filters: Vec<String> = vec![];
        let ignores: Vec<String> = vec![];

        let files = collect_files(base, &filters, &ignores, &[]).unwrap();
        let relative_paths = to_rel_paths(files, base);

        assert!(!relative_paths.contains(&"context-builder.toml".to_string()));
        assert!(relative_paths.contains(&"other.toml".to_string()));
    }

    #[test]
    fn confirm_processing_small_count() {
        // Test that small file counts don't require confirmation
        let result = confirm_processing(50);
        assert!(result.is_ok());
        assert!(result.unwrap());
    }

    #[test]
    fn find_latest_file_empty_directory() {
        let dir = tempdir().unwrap();
        let result = find_latest_file(dir.path()).unwrap();
        assert!(result.is_none());
    }

    #[test]
    fn find_latest_file_nonexistent_directory() {
        let dir = tempdir().unwrap();
        let nonexistent = dir.path().join("nonexistent");
        let result = find_latest_file(&nonexistent).unwrap();
        assert!(result.is_none());
    }

    #[test]
    fn find_latest_file_single_file() {
        let dir = tempdir().unwrap();
        let file_path = dir.path().join("test.txt");
        fs::write(&file_path, "content").unwrap();

        let result = find_latest_file(dir.path()).unwrap();
        assert!(result.is_some());
        assert_eq!(result.unwrap(), file_path);
    }

    #[test]
    fn find_latest_file_multiple_files() {
        let dir = tempdir().unwrap();

        let file1 = dir.path().join("old.txt");
        let file2 = dir.path().join("new.txt");

        fs::write(&file1, "old content").unwrap();
        std::thread::sleep(std::time::Duration::from_millis(10));
        fs::write(&file2, "new content").unwrap();

        let result = find_latest_file(dir.path()).unwrap();
        assert!(result.is_some());
        assert_eq!(result.unwrap(), file2);
    }

    #[test]
    fn find_latest_file_ignores_directories() {
        let dir = tempdir().unwrap();
        let subdir = dir.path().join("subdir");
        fs::create_dir(&subdir).unwrap();

        let file_path = dir.path().join("test.txt");
        fs::write(&file_path, "content").unwrap();

        let result = find_latest_file(dir.path()).unwrap();
        assert!(result.is_some());
        assert_eq!(result.unwrap(), file_path);
    }

    #[test]
    fn test_confirm_processing_requires_user_interaction() {
        // This test verifies the function signature and basic logic for large file counts
        // The actual user interaction cannot be tested in unit tests

        // For file counts <= 100, should return Ok(true) without prompting
        // This is already tested implicitly by the fact that small counts don't prompt

        // For file counts > 100, the function would prompt user input
        // We can't easily test this without mocking stdin, but we can verify
        // that the function exists and has the expected signature
        use std::io::Cursor;

        // Create a mock stdin that simulates user typing "y"
        let input = b"y\n";
        let _ = Cursor::new(input);

        // We can't easily override stdin in a unit test without complex setup,
        // so we'll just verify the function exists and handles small counts
        let result = confirm_processing(50);
        assert!(result.is_ok());
        assert!(result.unwrap());
    }

    #[test]
    fn test_confirm_overwrite_function_exists() {
        // Similar to confirm_processing, this function requires user interaction
        // We can verify it exists and has the expected signature

        // For testing purposes, we know this function prompts for user input
        // and returns Ok(true) if user types "y" or "Y", Ok(false) otherwise

        // The function signature should be:
        // pub fn confirm_overwrite(file_path: &str) -> io::Result<bool>

        // We can't easily test the interactive behavior without mocking stdin,
        // but we can ensure the function compiles and has the right signature
        let _: fn(&str) -> std::io::Result<bool> = confirm_overwrite;
    }

    #[test]
    fn test_collect_files_handles_permission_errors() {
        // Test what happens when we can't access a directory
        // This is harder to test portably, but we can test with invalid patterns
        let dir = tempdir().unwrap();
        let base = dir.path();

        // Test with a pattern that might cause issues
        let filters: Vec<String> = vec![];
        let ignores: Vec<String> = vec!["[invalid".into()]; // Incomplete bracket

        let result = collect_files(base, &filters, &ignores, &[]);
        assert!(result.is_err());
    }

    #[test]
    fn test_find_latest_file_permission_error() {
        // Test behavior when we can't read directory metadata
        use std::path::Path;

        // Test with a path that doesn't exist
        let nonexistent = Path::new("/this/path/should/not/exist/anywhere");
        let result = find_latest_file(nonexistent);

        // Should return Ok(None) for non-existent directories
        assert!(result.is_ok());
        assert!(result.unwrap().is_none());
    }

    #[test]
    fn test_collect_files_with_symlinks() {
        // Test behavior with symbolic links (if supported on platform)
        let dir = tempdir().unwrap();
        let base = dir.path();

        // Create a regular file
        fs::write(base.join("regular.txt"), "content").unwrap();

        // On Unix-like systems, try creating a symlink
        #[cfg(unix)]
        {
            use std::os::unix::fs::symlink;
            let _ = symlink("regular.txt", base.join("link.txt"));
        }

        // On Windows, symlinks require special privileges, so skip this part
        #[cfg(windows)]
        {
            // Just create another regular file to test
            fs::write(base.join("another.txt"), "content2").unwrap();
        }

        let filters: Vec<String> = vec![];
        let ignores: Vec<String> = vec![];

        let files = collect_files(base, &filters, &ignores, &[]).unwrap();
        // Should find at least the regular file
        assert!(!files.is_empty());
    }
}
```

### File: `src/markdown.rs`

- Size: 40252 bytes
- Modified: 2026-02-15 00:33:58 UTC

```rust
use chrono::Utc;
use ignore::DirEntry;
use log::{error, info, warn};
use std::fs;
use std::io::{self, Read, Seek, SeekFrom, Write};
use std::path::Path;

use crate::tree::{FileTree, write_tree_to_file};
use encoding_rs::{Encoding, UTF_8};

#[cfg(feature = "parallel")]
use crossbeam_channel::{Receiver, Sender, bounded};
#[cfg(feature = "parallel")]
use std::thread;

/// Generates the final Markdown file.
#[allow(clippy::too_many_arguments, unused_variables)]
pub fn generate_markdown(
    output_path: &str,
    input_dir: &str,
    filters: &[String],
    ignores: &[String],
    file_tree: &FileTree,
    files: &[DirEntry],
    base_path: &Path,
    line_numbers: bool,
    encoding_strategy: Option<&str>,
    max_tokens: Option<usize>,
) -> io::Result<()> {
    if let Some(parent) = Path::new(output_path).parent()
        && !parent.exists()
    {
        fs::create_dir_all(parent)?;
    }

    let mut output = fs::File::create(output_path)?;

    let input_dir_name = if input_dir == "." {
        let current_dir = std::env::current_dir()?;
        current_dir
            .file_name()
            .unwrap()
            .to_str()
            .unwrap()
            .to_string()
    } else {
        input_dir.to_string()
    };

    // --- Header --- //
    writeln!(output, "# Directory Structure Report\n")?;

    if !filters.is_empty() {
        writeln!(
            output,
            "This document contains files from the `{}` directory with extensions: {}",
            input_dir_name,
            filters.join(", ")
        )?;
    } else {
        writeln!(
            output,
            "This document contains all files from the `{}` directory, optimized for LLM consumption.",
            input_dir_name
        )?;
    }

    if !ignores.is_empty() {
        writeln!(output, "Custom ignored patterns: {}", ignores.join(", "))?;
    }

    // Deterministic content hash (enables LLM prompt caching across runs)
    // Uses xxh3 over file content bytes — stable across Rust versions and machines.
    // Previous implementation hashed mtime (broken by git checkout, cp, etc.)
    let mut content_hasher = xxhash_rust::xxh3::Xxh3::new();
    for entry in files {
        // Hash relative unix-style path for cross-OS determinism.
        // Using absolute or OS-native paths would produce different hashes
        // on different machines or operating systems.
        let rel_path = entry.path().strip_prefix(base_path).unwrap_or(entry.path());
        let normalized = rel_path.to_string_lossy().replace('\\', "/");
        content_hasher.update(normalized.as_bytes());
        // Null delimiter prevents collision: path="a" content="bc" vs path="ab" content="c"
        content_hasher.update(b"\0");
        // Hash actual file content (not mtime!) for determinism
        if let Ok(bytes) = std::fs::read(entry.path()) {
            content_hasher.update(&bytes);
        }
        content_hasher.update(b"\0");
    }
    writeln!(output, "Content hash: {:016x}", content_hasher.digest())?;
    writeln!(output)?;

    // --- File Tree --- //

    writeln!(output, "## File Tree Structure\n")?;

    write_tree_to_file(&mut output, file_tree, 0)?;

    writeln!(output)?;

    // (No '## Files' heading here; it will be injected later only once during final composition)
    // (Diff section will be conditionally inserted later by the auto_diff logic in lib.rs)

    #[cfg(feature = "parallel")]
    {
        use rayon::prelude::*;

        // Create a bounded channel for ordered chunks
        type ChunkResult = (usize, io::Result<Vec<u8>>);
        let (sender, receiver): (Sender<ChunkResult>, Receiver<ChunkResult>) =
            bounded(num_cpus::get() * 2); // Buffer size based on CPU count

        let writer_handle = {
            let mut output = output;
            let total_files = files.len();
            let budget = max_tokens;

            thread::spawn(move || -> io::Result<()> {
                let mut completed_chunks = std::collections::BTreeMap::new();
                let mut next_index = 0;
                let mut errors = Vec::new();
                let mut tokens_used: usize = 0;
                let mut budget_exceeded = false;

                // Receive chunks and write them in order
                while next_index < total_files {
                    match receiver.recv() {
                        Ok((index, chunk_result)) => {
                            completed_chunks.insert(index, chunk_result);

                            // Write all consecutive chunks starting from next_index
                            while let Some(chunk_result) = completed_chunks.remove(&next_index) {
                                if budget_exceeded {
                                    // Already over budget — skip remaining chunks
                                    next_index += 1;
                                    continue;
                                }

                                match chunk_result {
                                    Ok(buf) => {
                                        // Estimate tokens for this chunk (~4 bytes per token)
                                        let chunk_tokens = buf.len() / 4;

                                        if let Some(max) = budget
                                            && tokens_used + chunk_tokens > max
                                            && tokens_used > 0
                                        {
                                            let remaining = total_files - next_index;
                                            let notice = format!(
                                                "---\n\n_⚠️ Token budget ({}) reached. {} remaining files omitted._\n\n",
                                                max, remaining
                                            );
                                            if let Err(e) = output.write_all(notice.as_bytes()) {
                                                errors.push(format!(
                                                    "Failed to write truncation notice: {}",
                                                    e
                                                ));
                                            }
                                            budget_exceeded = true;
                                            next_index += 1;
                                            continue;
                                        }

                                        tokens_used += chunk_tokens;
                                        if let Err(e) = output.write_all(&buf) {
                                            errors.push(format!(
                                                "Failed to write output for file index {}: {}",
                                                next_index, e
                                            ));
                                        }
                                    }
                                    Err(e) => {
                                        errors.push(format!(
                                            "Failed to process file index {}: {}",
                                            next_index, e
                                        ));
                                    }
                                }
                                next_index += 1;
                            }
                        }
                        Err(_) => break, // Channel closed
                    }
                }

                if !errors.is_empty() {
                    error!(
                        "Encountered {} errors during parallel processing:",
                        errors.len()
                    );
                    for err in &errors {
                        error!("  {}", err);
                    }
                    return Err(std::io::Error::other(format!(
                        "Failed to process {} files: {}",
                        errors.len(),
                        errors.join("; ")
                    )));
                }

                Ok(())
            })
        };

        // Process files in parallel and send results to writer
        files.par_iter().enumerate().for_each(|(index, entry)| {
            let mut buf = Vec::new();
            let result = process_file(
                base_path,
                entry.path(),
                &mut buf,
                line_numbers,
                encoding_strategy,
            )
            .map(|_| buf);

            // Send result to writer thread (ignore send errors - channel might be closed)
            let _ = sender.send((index, result));
        });

        // Close the sender to signal completion
        drop(sender);

        // Wait for writer thread to complete and propagate any errors
        writer_handle
            .join()
            .map_err(|_| std::io::Error::other("Writer thread panicked"))??;
    }

    #[cfg(not(feature = "parallel"))]
    {
        let mut tokens_used: usize = 0;

        for (idx, entry) in files.iter().enumerate() {
            // Estimate tokens for this file (~4 bytes per token)
            let file_size = std::fs::metadata(entry.path())
                .map(|m| m.len())
                .unwrap_or(0);
            let estimated_file_tokens = (file_size as usize) / 4;

            if let Some(budget) = max_tokens {
                if tokens_used + estimated_file_tokens > budget && tokens_used > 0 {
                    let remaining = files.len() - idx;
                    writeln!(output, "---\n")?;
                    writeln!(
                        output,
                        "_⚠️ Token budget ({}) reached. {} remaining files omitted._\n",
                        budget, remaining
                    )?;
                    break;
                }
            }

            tokens_used += estimated_file_tokens;
            process_file(
                base_path,
                entry.path(),
                &mut output,
                line_numbers,
                encoding_strategy,
            )?;
        }
    }

    Ok(())
}

/// Processes a single file and writes its content to the output.
pub fn process_file(
    base_path: &Path,

    file_path: &Path,

    output: &mut impl Write,
    line_numbers: bool,
    encoding_strategy: Option<&str>,
) -> io::Result<()> {
    let relative_path = file_path.strip_prefix(base_path).unwrap_or(file_path);
    info!("Processing file: {}", relative_path.display());

    let metadata = match fs::metadata(file_path) {
        Ok(meta) => meta,
        Err(e) => {
            error!(
                "Failed to get metadata for {}: {}",
                relative_path.display(),
                e
            );
            return Ok(());
        }
    };

    let modified_time = metadata
        .modified()
        .ok()
        .map(|time| {
            let system_time: chrono::DateTime<Utc> = time.into();
            system_time.format("%Y-%m-%d %H:%M:%S UTC").to_string()
        })
        .unwrap_or_else(|| "Unknown".to_string());

    writeln!(output)?;
    writeln!(output, "### File: `{}`", relative_path.display())?;

    writeln!(output)?;

    writeln!(output, "- Size: {} bytes", metadata.len())?;
    writeln!(output, "- Modified: {}", modified_time)?;
    writeln!(output)?;

    // --- File Content --- //
    let extension = file_path
        .extension()
        .and_then(|s| s.to_str())
        .unwrap_or("text");
    let language = match extension {
        "rs" => "rust",
        "js" => "javascript",
        "ts" => "typescript",
        "jsx" => "jsx",
        "tsx" => "tsx",
        "json" => "json",
        "toml" => "toml",
        "md" => "markdown",
        "yaml" | "yml" => "yaml",
        "html" => "html",
        "css" => "css",
        "py" => "python",
        "java" => "java",
        "cpp" => "cpp",
        "c" => "c",
        "h" => "c",
        "hpp" => "cpp",
        "sql" => "sql",
        "sh" => "bash",
        "xml" => "xml",
        "lock" => "toml",
        _ => extension,
    };

    // Enhanced binary file handling with encoding detection and transcoding
    match fs::File::open(file_path) {
        Ok(mut file) => {
            let mut sniff = [0u8; 8192];
            let n = match file.read(&mut sniff) {
                Ok(n) => n,
                Err(e) => {
                    warn!(
                        "Could not read file {}: {}. Skipping content.",
                        relative_path.display(),
                        e
                    );

                    writeln!(output, "```text")?;

                    writeln!(
                        output,
                        "<Could not read file content (e.g., binary file or permission error)>"
                    )?;

                    writeln!(output, "```")?;

                    return Ok(());
                }
            };
            let slice = &sniff[..n];

            // Find a valid UTF-8 boundary by backtracking up to 3 bytes.
            // If the sniff buffer cuts a multi-byte char (e.g., emoji at byte 8191),
            // from_utf8 would falsely classify the file as non-UTF-8.
            let check_len = if n == sniff.len() {
                // Buffer is full — may have split a multi-byte char at the end
                let mut end = n;
                while end > 0 && end > n.saturating_sub(4) && sniff[end - 1] & 0xC0 == 0x80 {
                    end -= 1; // skip continuation bytes
                }
                // If we landed on a leading byte, check if the sequence is complete
                if end > 0 && end < n {
                    let leading = sniff[end - 1];
                    let expected_len = if leading & 0xE0 == 0xC0 {
                        2
                    } else if leading & 0xF0 == 0xE0 {
                        3
                    } else if leading & 0xF8 == 0xF0 {
                        4
                    } else {
                        1
                    };
                    if end - 1 + expected_len > n {
                        end - 1 // incomplete char — exclude the leading byte too
                    } else {
                        n
                    }
                } else {
                    n
                }
            } else {
                n // didn't fill the buffer, so no boundary issue
            };

            // First check if it's valid UTF-8
            let is_utf8 = std::str::from_utf8(&sniff[..check_len]).is_ok();

            if is_utf8 && !slice.contains(&0) {
                // Valid UTF-8 text file - proceed normally
            } else {
                // Try encoding detection for non-UTF-8 files
                // If it's not UTF-8, try to detect the encoding
                let (encoding, _consumed) =
                    encoding_rs::Encoding::for_bom(slice).unwrap_or((encoding_rs::UTF_8, 0));

                // If it's not UTF-8, try to detect the encoding
                let detected_encoding = if encoding == UTF_8 {
                    // Use chardet-like detection for common encodings
                    detect_text_encoding(slice)
                } else {
                    Some(encoding)
                };

                match detected_encoding {
                    Some(enc) if enc != UTF_8 => {
                        let strategy = encoding_strategy.unwrap_or("detect");
                        match strategy {
                            "strict" | "skip" => {
                                // Skip files with non-UTF-8 encoding
                                warn!(
                                    "Skipping non-UTF-8 file {} (encoding: {}, strategy: {})",
                                    relative_path.display(),
                                    enc.name(),
                                    strategy
                                );
                            }
                            _ => {
                                // Default "detect" strategy: attempt to transcode
                                match transcode_file_content(file_path, enc) {
                                    Ok(transcoded_content) => {
                                        info!(
                                            "Successfully transcoded {} from {} to UTF-8",
                                            relative_path.display(),
                                            enc.name()
                                        );
                                        write_text_content(
                                            output,
                                            &transcoded_content,
                                            language,
                                            line_numbers,
                                        )?;
                                        return Ok(());
                                    }
                                    Err(e) => {
                                        warn!(
                                            "Failed to transcode {} from {}: {}. Treating as binary.",
                                            relative_path.display(),
                                            enc.name(),
                                            e
                                        );
                                    }
                                }
                            }
                        }
                    }
                    _ => {
                        // Check if it's likely binary (contains null bytes)
                        if slice.contains(&0) {
                            warn!(
                                "Detected binary file {} (contains null bytes). Skipping content.",
                                relative_path.display()
                            );
                        } else {
                            warn!(
                                "Could not determine encoding for {}. Treating as binary.",
                                relative_path.display()
                            );
                        }
                    }
                }

                // Fallback to binary file placeholder
                writeln!(output, "```text")?;
                writeln!(
                    output,
                    "<Binary file or unsupported encoding: {} bytes>",
                    metadata.len()
                )?;
                writeln!(output, "```")?;
                return Ok(());
            }

            // Reset cursor and stream the content
            if let Err(e) = file.seek(SeekFrom::Start(0)) {
                warn!(
                    "Could not reset file cursor for {}: {}. Skipping content.",
                    relative_path.display(),
                    e
                );
                writeln!(output, "```text")?;
                writeln!(
                    output,
                    "<Could not read file content (e.g., binary file or permission error)>"
                )?;
                writeln!(output, "```")?;
                return Ok(());
            }

            // Stream UTF-8 content
            let content = match std::fs::read_to_string(file_path) {
                Ok(content) => content,
                Err(e) => {
                    warn!(
                        "Error reading file {}: {}. Output may be truncated.",
                        relative_path.display(),
                        e
                    );
                    writeln!(output, "```text")?;
                    writeln!(output, "<Error reading file content>")?;
                    writeln!(output, "```")?;
                    return Ok(());
                }
            };

            write_text_content(output, &content, language, line_numbers)?;
        }
        Err(e) => {
            warn!(
                "Could not open file {}: {}. Skipping content.",
                relative_path.display(),
                e
            );
            writeln!(output, "```text")?;
            writeln!(
                output,
                "<Could not read file content (e.g., binary file or permission error)>"
            )?;
            writeln!(output, "```")?;
        }
    }

    Ok(())
}

/// Detect text encoding using heuristics for common encodings
fn detect_text_encoding(bytes: &[u8]) -> Option<&'static Encoding> {
    // Try common encodings
    let encodings = [
        encoding_rs::WINDOWS_1252,
        encoding_rs::UTF_16LE,
        encoding_rs::UTF_16BE,
        encoding_rs::SHIFT_JIS,
    ];

    for encoding in &encodings {
        let (decoded, _, had_errors) = encoding.decode(bytes);
        if !had_errors && is_likely_text(&decoded) {
            return Some(encoding);
        }
    }

    None
}

/// Check if decoded content looks like text (no control characters except common ones)
fn is_likely_text(content: &str) -> bool {
    let mut control_chars = 0;
    let mut total_chars = 0;

    for ch in content.chars() {
        total_chars += 1;
        if ch.is_control() && ch != '\n' && ch != '\r' && ch != '\t' {
            control_chars += 1;
        }

        // If more than 5% control characters, probably not text
        if total_chars > 100 && control_chars * 20 > total_chars {
            return false;
        }
    }

    // Allow up to 5% control characters in small files
    if total_chars > 0 {
        control_chars * 20 <= total_chars
    } else {
        true
    }
}

/// Transcode file content from detected encoding to UTF-8
fn transcode_file_content(file_path: &Path, encoding: &'static Encoding) -> io::Result<String> {
    let bytes = std::fs::read(file_path)?;
    let (decoded, _, had_errors) = encoding.decode(&bytes);

    if had_errors {
        return Err(io::Error::new(
            io::ErrorKind::InvalidData,
            format!("Failed to decode file with encoding {}", encoding.name()),
        ));
    }

    Ok(decoded.into_owned())
}

/// Write text content with optional line numbers
fn write_text_content(
    output: &mut impl Write,
    content: &str,
    language: &str,
    line_numbers: bool,
) -> io::Result<()> {
    writeln!(output, "```{}", language)?;

    if line_numbers {
        for (i, line) in content.lines().enumerate() {
            writeln!(output, "{:>4} | {}", i + 1, line)?;
        }
    } else {
        output.write_all(content.as_bytes())?;
        if !content.ends_with('\n') {
            writeln!(output)?;
        }
    }

    writeln!(output, "```")?;
    Ok(())
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::fs;
    use tempfile::tempdir;

    #[test]
    fn test_code_block_formatting() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let file_path = base_path.join("test.rs");
        let output_path = base_path.join("output.md");

        // Create a test Rust file
        fs::write(
            &file_path,
            "fn main() {\n    println!(\"Hello, world!\");\n}",
        )
        .unwrap();

        // Create an output file
        let mut output = fs::File::create(&output_path).unwrap();

        // Process the file
        process_file(base_path, &file_path, &mut output, false, None).unwrap();

        // Read the output
        let content = fs::read_to_string(&output_path).unwrap();

        // Check that code blocks are properly formatted
        assert!(content.contains("```rust"));
        assert!(content.contains("```") && content.matches("```").count() >= 2);
    }

    #[test]
    fn test_markdown_file_formatting() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let file_path = base_path.join("README.md");
        let output_path = base_path.join("output.md");

        // Create a test Markdown file
        fs::write(&file_path, "# Test\n\nThis is a test markdown file.").unwrap();

        // Create an output file
        let mut output = fs::File::create(&output_path).unwrap();

        // Process the file
        process_file(base_path, &file_path, &mut output, false, None).unwrap();

        // Read the output
        let content = fs::read_to_string(&output_path).unwrap();

        // Debug prints the content
        println!("Generated content:\n{}", content);

        // Check that markdown files use the correct language identifier
        assert!(
            content.contains("```markdown"),
            "Content should contain '```markdown' but was: {}",
            content
        );
        // Count the number of code block markers
        let code_block_markers = content.matches("```").count();

        assert!(
            code_block_markers >= 2,
            "Expected at least 2 code block markers, found {}",
            code_block_markers
        );
    }

    #[test]
    fn test_line_numbered_code_blocks() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let file_path = base_path.join("lib.rs");
        let output_path = base_path.join("out.md");

        // Create a multi-line Rust file
        fs::write(
                    &file_path,
                    "fn add(a: i32, b: i32) -> i32 {\n    a + b\n}\n\nfn main() {\n    println!(\"{}\", add(1, 2));\n}\n",
                )
                .unwrap();

        let mut output = fs::File::create(&output_path).unwrap();
        process_file(base_path, &file_path, &mut output, true, None).unwrap();

        let content = fs::read_to_string(&output_path).unwrap();

        // Check language and line numbers prefix
        assert!(content.contains("```rust"));
        assert!(content.contains("   1 | "));
        assert!(content.contains("   2 | "));

        // Count lines with "|" prefix equals number of lines in an original file
        let numbered_lines = content
            .lines()
            .filter(|l| {
                l.trim_start()
                    .chars()
                    .next()
                    .map(|c| c.is_ascii_digit())
                    .unwrap_or(false)
                    && l.contains(" | ")
            })
            .count();
        let original_line_count = fs::read_to_string(&file_path).unwrap().lines().count();
        assert_eq!(numbered_lines, original_line_count);

        // Ensure code fence closes
        assert!(content.contains("```"));
    }

    #[test]
    fn test_binary_file_handling() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let file_path = base_path.join("image.bin");
        let output_path = base_path.join("out.md");

        // Write truly binary data that won't be decoded by encoding detection
        let bytes = vec![
            0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A, // PNG header
            0x00, 0x00, 0x00, 0x0D, 0x49, 0x48, 0x44, 0x52, // PNG chunk
            0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, // More binary data
            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // Null bytes
        ];
        fs::write(&file_path, bytes).unwrap();

        let mut output = fs::File::create(&output_path).unwrap();
        process_file(base_path, &file_path, &mut output, false, None).unwrap();

        let content = fs::read_to_string(&output_path).unwrap();

        // Expect a text block to fall back with a helpful message
        assert!(content.contains("```text"));
        assert!(content.contains("<Binary file or unsupported encoding:"));

        // Ensure the code block is closed
        let fence_count = content.matches("```").count();
        assert!(
            fence_count >= 2,
            "expected at least opening and closing fences, got {}",
            fence_count
        );
    }

    #[test]
    fn test_encoding_detection_and_transcoding() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let output_path = base_path.join("out.md");

        // Test Windows-1252 encoded file (common in Windows)
        let windows1252_content = [
            0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, // "Hello "
            0x93, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x94, // "World" with smart quotes
            0x0A, // newline
        ];
        let file_path = base_path.join("windows1252.txt");
        fs::write(&file_path, windows1252_content).unwrap();

        let mut output = fs::File::create(&output_path).unwrap();
        process_file(base_path, &file_path, &mut output, false, Some("detect")).unwrap();

        let content = fs::read_to_string(&output_path).unwrap();

        // Should contain transcoded content with UTF-8 equivalents
        assert!(content.contains("Hello"));
        assert!(content.contains("World"));
        // Should use text language
        assert!(content.contains("```txt"));

        // Ensure the code block is closed
        let fence_count = content.matches("```").count();
        assert!(
            fence_count >= 2,
            "expected at least opening and closing fences, got {}",
            fence_count
        );
    }

    #[test]
    fn test_encoding_strategy_strict() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let output_path = base_path.join("out.md");

        // Create a file with non-UTF-8 content
        let non_utf8_content = [0xFF, 0xFE, 0x41, 0x00]; // UTF-16 LE BOM + "A"
        let file_path = base_path.join("utf16.txt");
        fs::write(&file_path, non_utf8_content).unwrap();

        let mut output = fs::File::create(&output_path).unwrap();
        process_file(base_path, &file_path, &mut output, false, Some("strict")).unwrap();

        let content = fs::read_to_string(&output_path).unwrap();

        // Should contain binary file placeholder
        assert!(content.contains("<Binary file or unsupported encoding:"));
        assert!(content.contains("```text"));

        // Ensure the code block is closed
        let fence_count = content.matches("```").count();
        assert!(
            fence_count >= 2,
            "expected at least opening and closing fences, got {}",
            fence_count
        );
    }

    #[test]
    fn test_encoding_strategy_skip() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let output_path = base_path.join("out.md");

        // Create a file with UTF-16 content
        let utf16_content = [0xFF, 0xFE, 0x48, 0x00, 0x69, 0x00]; // UTF-16 LE "Hi"
        let file_path = base_path.join("utf16.txt");
        fs::write(&file_path, utf16_content).unwrap();

        let mut output = fs::File::create(&output_path).unwrap();
        process_file(base_path, &file_path, &mut output, false, Some("skip")).unwrap();

        let content = fs::read_to_string(&output_path).unwrap();

        // Should contain binary file placeholder (skipped transcoding)
        assert!(content.contains("<Binary file or unsupported encoding:"));
        assert!(content.contains("```text"));
    }

    #[test]
    fn test_generate_markdown_with_current_directory() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let output_path = base_path.join("test.md");

        // Create test files
        fs::write(base_path.join("readme.txt"), "Hello world").unwrap();

        // Collect files
        let files = crate::file_utils::collect_files(base_path, &[], &[], &[]).unwrap();
        let file_tree = crate::tree::build_file_tree(&files, base_path);

        // Change to the test directory
        let original_dir = std::env::current_dir().unwrap();
        std::env::set_current_dir(base_path).unwrap();

        // Test with "." as input directory
        let result = generate_markdown(
            &output_path.to_string_lossy(),
            ".",
            &[],
            &[],
            &file_tree,
            &files,
            base_path,
            false,
            None,
            None, // max_tokens
        );

        // Restore original directory
        std::env::set_current_dir(original_dir).unwrap();

        assert!(result.is_ok());
        let content = fs::read_to_string(&output_path).unwrap();
        assert!(content.contains("Directory Structure Report"));
    }

    #[test]
    fn test_generate_markdown_creates_output_directory() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let nested_output = base_path.join("nested").join("deep").join("output.md");

        // Create test files
        fs::write(base_path.join("test.txt"), "content").unwrap();

        let files = crate::file_utils::collect_files(base_path, &[], &[], &[]).unwrap();
        let file_tree = crate::tree::build_file_tree(&files, base_path);

        let result = generate_markdown(
            &nested_output.to_string_lossy(),
            "test_dir",
            &[],
            &[],
            &file_tree,
            &files,
            base_path,
            false,
            None,
            None, // max_tokens
        );

        assert!(result.is_ok());
        assert!(nested_output.exists());
        assert!(nested_output.parent().unwrap().exists());
    }

    #[test]
    fn test_generate_markdown_with_filters_and_ignores() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let output_path = base_path.join("filtered.md");

        fs::write(base_path.join("main.rs"), "fn main() {}").unwrap();
        fs::write(base_path.join("config.toml"), "[package]").unwrap();
        fs::write(base_path.join("readme.md"), "# README").unwrap();

        let files = crate::file_utils::collect_files(base_path, &[], &[], &[]).unwrap();
        let file_tree = crate::tree::build_file_tree(&files, base_path);

        let result = generate_markdown(
            &output_path.to_string_lossy(),
            "project",
            &["rs".to_string(), "toml".to_string()],
            &["readme.md".to_string()],
            &file_tree,
            &files,
            base_path,
            true,
            Some("strict"),
            None, // max_tokens
        );

        assert!(result.is_ok());
        let content = fs::read_to_string(&output_path).unwrap();
        assert!(content.contains("Directory Structure Report"));
        // The actual generate_markdown function doesn't format filters/ignores this way
        assert!(content.contains("main.rs") || content.contains("config.toml"));
    }

    #[test]
    fn test_write_text_content_with_line_numbers() {
        let mut output = Vec::new();
        let content = "line one\nline two\nline three";

        write_text_content(&mut output, content, "rust", true).unwrap();

        let result = String::from_utf8(output).unwrap();
        assert!(result.contains("```rust"));
        assert!(result.contains("   1 | line one"));
        assert!(result.contains("   2 | line two"));
        assert!(result.contains("   3 | line three"));
        assert!(result.contains("```"));
    }

    #[test]
    fn test_write_text_content_without_line_numbers() {
        let mut output = Vec::new();
        let content = "function test() {\n  return true;\n}";

        write_text_content(&mut output, content, "javascript", false).unwrap();

        let result = String::from_utf8(output).unwrap();
        assert!(result.contains("```javascript"));
        assert!(result.contains("function test() {"));
        assert!(result.contains("  return true;"));
        assert!(result.contains("```"));
        assert!(!result.contains(" | ")); // No line number prefix
    }

    #[test]
    fn test_write_text_content_without_trailing_newline() {
        let mut output = Vec::new();
        let content = "no newline at end"; // No \n at end

        write_text_content(&mut output, content, "text", false).unwrap();

        let result = String::from_utf8(output).unwrap();
        assert!(result.contains("```text"));
        assert!(result.contains("no newline at end"));
        assert!(result.ends_with("```\n")); // Should add newline
    }

    #[test]
    fn test_is_likely_text() {
        // Normal text should be considered text
        assert!(is_likely_text("Hello world\nThis is normal text"));

        // Text with some control characters should still be text
        assert!(is_likely_text(
            "Line 1\nLine 2\tTabbed\r\nWindows line ending"
        ));

        // Text with too many control characters should not be text
        let mut bad_text = String::new();
        for i in 0..200 {
            if i % 5 == 0 {
                bad_text.push('\x01'); // Control character
            } else {
                bad_text.push('a');
            }
        }
        assert!(!is_likely_text(&bad_text));

        // Empty string should be considered text
        assert!(is_likely_text(""));
    }

    #[test]
    fn test_detect_text_encoding() {
        // UTF-8 should return None (already UTF-8)
        let utf8_bytes = "Hello world".as_bytes();
        let result = detect_text_encoding(utf8_bytes);
        // The function may return an encoding even for UTF-8 text if it detects it differently
        // Just verify it doesn't crash
        assert!(result.is_some() || result.is_none());

        // Windows-1252 encoded text should be detected
        let windows1252_bytes = [
            0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, 0x93, 0x77, 0x6F, 0x72, 0x6C, 0x64, 0x94,
        ];
        let detected = detect_text_encoding(&windows1252_bytes);
        assert!(detected.is_some());
    }

    #[test]
    fn test_transcode_file_content() {
        let dir = tempdir().unwrap();
        let file_path = dir.path().join("windows1252.txt");

        // Write Windows-1252 encoded content
        let windows1252_content = [
            0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, // "Hello "
            0x93, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x94, // "World" with smart quotes
        ];
        fs::write(&file_path, windows1252_content).unwrap();

        let result = transcode_file_content(&file_path, encoding_rs::WINDOWS_1252);
        assert!(result.is_ok());

        let transcoded = result.unwrap();
        assert!(transcoded.contains("Hello"));
        assert!(transcoded.contains("World"));
    }

    #[test]
    fn test_process_file_with_metadata_error() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let nonexistent_file = base_path.join("nonexistent.txt");
        let output_path = base_path.join("output.md");

        let mut output = fs::File::create(&output_path).unwrap();

        // This should handle the metadata error gracefully
        let result = process_file(base_path, &nonexistent_file, &mut output, false, None);
        assert!(result.is_ok());

        // Output should be minimal since file doesn't exist
        let content = fs::read_to_string(&output_path).unwrap();
        assert!(content.is_empty() || content.trim().is_empty());
    }

    #[test]
    fn test_process_file_with_different_extensions() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let output_path = base_path.join("output.md");

        // Test various file extensions
        let test_files = [
            ("script.py", "print('hello')", "python"),
            ("data.json", r#"{"key": "value"}"#, "json"),
            ("config.yaml", "key: value", "yaml"),
            ("style.css", "body { margin: 0; }", "css"),
            ("page.html", "<html><body>Test</body></html>", "html"),
            ("query.sql", "SELECT * FROM users;", "sql"),
            ("build.sh", "#!/bin/bash\necho 'building'", "bash"),
            ("unknown.xyz", "unknown content", "xyz"),
        ];

        for (filename, content, expected_lang) in test_files.iter() {
            let file_path = base_path.join(filename);
            fs::write(&file_path, content).unwrap();

            let mut output = fs::File::create(&output_path).unwrap();
            process_file(base_path, &file_path, &mut output, false, None).unwrap();

            let result = fs::read_to_string(&output_path).unwrap();
            assert!(result.contains(&format!("```{}", expected_lang)));
            assert!(result.contains(content));
            assert!(result.contains(filename));
        }
    }
}
```

### File: `src/state.rs`

- Size: 26113 bytes
- Modified: 2026-02-14 22:41:02 UTC

```rust
//! Project state representation for context-builder.
//!
//! This module provides structured data types to represent the state of a project
//! at a point in time. This replaces the previous approach of caching generated
//! markdown and enables more robust diff generation.

use chrono::Utc;
use ignore::DirEntry;
use serde::{Deserialize, Serialize};
use std::collections::BTreeMap;
use std::path::{Path, PathBuf};
use std::time::SystemTime;

use crate::config::Config;
use crate::diff::{PerFileDiff, PerFileStatus, diff_file_contents};

/// Complete state representation of a project at a point in time
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct ProjectState {
    /// Timestamp when this state was captured
    pub timestamp: String,
    /// Hash of the configuration used to generate this state
    pub config_hash: String,
    /// Map of file paths to their state information
    pub files: BTreeMap<PathBuf, FileState>,
    /// Project metadata
    pub metadata: ProjectMetadata,
}

/// State information for a single file
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct FileState {
    /// Raw file content as string
    pub content: String,
    /// File size in bytes
    pub size: u64,
    /// Last modified time
    pub modified: SystemTime,
    /// Content hash for quick comparison
    pub content_hash: String,
}

/// Metadata about the project
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct ProjectMetadata {
    /// Project directory name
    pub project_name: String,
    /// Total number of files processed
    pub file_count: usize,
    /// Filters applied during processing
    pub filters: Vec<String>,
    /// Ignore patterns applied
    pub ignores: Vec<String>,
    /// Whether line numbers were enabled
    pub line_numbers: bool,
}

/// Result of comparing two project states
#[derive(Debug, Clone)]
pub struct StateComparison {
    /// Per-file differences
    pub file_diffs: Vec<PerFileDiff>,
    /// Summary of changes
    pub summary: ChangeSummary,
}

/// Summary of changes between two states
#[derive(Debug, Clone)]
pub struct ChangeSummary {
    /// Files that were added
    pub added: Vec<PathBuf>,
    /// Files that were removed
    pub removed: Vec<PathBuf>,
    /// Files that were modified
    pub modified: Vec<PathBuf>,
    /// Total number of changed files
    pub total_changes: usize,
}

impl ProjectState {
    /// Create a new project state from collected files
    pub fn from_files(
        files: &[DirEntry],
        base_path: &Path,
        config: &Config,
        line_numbers: bool,
    ) -> std::io::Result<Self> {
        let mut file_states = BTreeMap::new();

        // Ensure paths stored in the state are *always* relative (never absolute).
        // This keeps cache stable across different launch contexts and matches
        // test expectations. We attempt a few strategies to derive a relative path.
        let cwd = std::env::current_dir().unwrap_or_else(|_| base_path.to_path_buf());
        for entry in files {
            let entry_path = entry.path();

            let relative_path = entry_path
                // Preferred: relative to provided base_path (common case when input is absolute)
                .strip_prefix(base_path)
                .or_else(|_| entry_path.strip_prefix(&cwd))
                .map(|p| p.to_path_buf())
                .unwrap_or_else(|_| {
                    // Fallback: last component (file name) to avoid leaking absolute paths
                    entry_path
                        .file_name()
                        .map(PathBuf::from)
                        .unwrap_or_else(|| entry_path.to_path_buf())
                });

            let file_state = FileState::from_path(entry_path)?;
            file_states.insert(relative_path, file_state);
        }

        // Resolve project name robustly: canonicalize to handle "." and relative paths
        let canonical = base_path.canonicalize().ok();
        let resolved = canonical.as_deref().unwrap_or(base_path);
        let project_name = resolved
            .file_name()
            .and_then(|n| n.to_str())
            .map(|s| s.to_string())
            .unwrap_or_else(|| {
                // Fallback: try CWD if base_path has no file_name (e.g., root path)
                std::env::current_dir()
                    .ok()
                    .and_then(|p| {
                        p.file_name()
                            .and_then(|n| n.to_str())
                            .map(|s| s.to_string())
                    })
                    .unwrap_or_else(|| "unknown".to_string())
            });

        let metadata = ProjectMetadata {
            project_name,
            file_count: files.len(),
            filters: config.filter.clone().unwrap_or_default(),
            ignores: config.ignore.clone().unwrap_or_default(),
            line_numbers,
        };

        Ok(ProjectState {
            timestamp: Utc::now().format("%Y-%m-%d %H:%M:%S UTC").to_string(),
            config_hash: Self::compute_config_hash(config),
            files: file_states,
            metadata,
        })
    }

    /// Compare this state with a previous state
    pub fn compare_with(&self, previous: &ProjectState) -> StateComparison {
        // Convert file states to content maps for diff_file_contents
        let previous_content: std::collections::HashMap<String, String> = previous
            .files
            .iter()
            .map(|(path, state)| (path.to_string_lossy().to_string(), state.content.clone()))
            .collect();

        let current_content: std::collections::HashMap<String, String> = self
            .files
            .iter()
            .map(|(path, state)| (path.to_string_lossy().to_string(), state.content.clone()))
            .collect();

        // Generate per-file diffs
        let file_diffs = diff_file_contents(&previous_content, &current_content, true, None);

        // Generate summary
        let mut added = Vec::new();
        let mut removed = Vec::new();
        let mut modified = Vec::new();

        for diff in &file_diffs {
            let path = PathBuf::from(&diff.path);
            match diff.status {
                PerFileStatus::Added => added.push(path),
                PerFileStatus::Removed => removed.push(path),
                PerFileStatus::Modified => modified.push(path),
                PerFileStatus::Unchanged => {}
            }
        }

        let summary = ChangeSummary {
            total_changes: added.len() + removed.len() + modified.len(),
            added,
            removed,
            modified,
        };

        StateComparison {
            file_diffs,
            summary,
        }
    }

    /// Check if this state has any content changes compared to another
    pub fn has_changes(&self, other: &ProjectState) -> bool {
        if self.files.len() != other.files.len() {
            return true;
        }

        for (path, state) in &self.files {
            match other.files.get(path) {
                Some(other_state) => {
                    if state.content_hash != other_state.content_hash {
                        return true;
                    }
                }
                None => return true,
            }
        }

        false
    }

    /// Generate a configuration hash for cache validation
    fn compute_config_hash(config: &Config) -> String {
        // Build a stable string representation for hashing
        let mut config_str = String::new();
        if let Some(ref filters) = config.filter {
            config_str.push_str(&filters.join(","));
        }
        config_str.push('|');
        if let Some(ref ignores) = config.ignore {
            config_str.push_str(&ignores.join(","));
        }
        config_str.push('|');
        config_str.push_str(&format!(
            "{:?}|{:?}|{:?}",
            config.line_numbers, config.auto_diff, config.diff_context_lines
        ));

        let hash = xxhash_rust::xxh3::xxh3_64(config_str.as_bytes());
        format!("{:x}", hash)
    }
}

impl FileState {
    /// Create a file state from a file path
    pub fn from_path(path: &Path) -> std::io::Result<Self> {
        use std::fs;
        use std::io::ErrorKind;

        let metadata = fs::metadata(path)?;

        let content = match fs::read_to_string(path) {
            Ok(content) => content,
            Err(e) if e.kind() == ErrorKind::InvalidData => {
                // Handle binary files gracefully
                log::warn!("Skipping binary file in auto-diff mode: {}", path.display());
                format!("<Binary file - {} bytes>", metadata.len())
            }
            Err(e) => return Err(e),
        };

        // Compute content hash using stable xxh3
        let content_hash = format!("{:016x}", xxhash_rust::xxh3::xxh3_64(content.as_bytes()));

        Ok(FileState {
            content,
            size: metadata.len(),
            modified: metadata.modified().unwrap_or(SystemTime::UNIX_EPOCH),
            content_hash,
        })
    }
}

impl ChangeSummary {
    /// Check if there are any changes
    pub fn has_changes(&self) -> bool {
        self.total_changes > 0
    }

    /// Generate markdown representation of the change summary
    pub fn to_markdown(&self) -> String {
        if !self.has_changes() {
            return String::new();
        }

        let mut output = String::new();
        output.push_str("## Change Summary\n\n");

        for path in &self.added {
            output.push_str(&format!("- Added: `{}`\n", path.display()));
        }

        for path in &self.removed {
            output.push_str(&format!("- Removed: `{}`\n", path.display()));
        }

        for path in &self.modified {
            output.push_str(&format!("- Modified: `{}`\n", path.display()));
        }

        output.push('\n');
        output
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::fs;
    use tempfile::tempdir;

    #[test]
    fn test_file_state_creation() {
        let temp_dir = tempdir().unwrap();
        let file_path = temp_dir.path().join("test.txt");
        fs::write(&file_path, "Hello, world!").unwrap();

        let file_state = FileState::from_path(&file_path).unwrap();

        assert_eq!(file_state.content, "Hello, world!");
        assert_eq!(file_state.size, 13);
        assert!(!file_state.content_hash.is_empty());
    }

    #[test]
    fn test_project_state_comparison() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        // Create initial files
        fs::write(base_path.join("file1.txt"), "content1").unwrap();
        fs::write(base_path.join("file2.txt"), "content2").unwrap();

        let mut state1_files = BTreeMap::new();
        state1_files.insert(
            PathBuf::from("file1.txt"),
            FileState::from_path(&base_path.join("file1.txt")).unwrap(),
        );
        state1_files.insert(
            PathBuf::from("file2.txt"),
            FileState::from_path(&base_path.join("file2.txt")).unwrap(),
        );

        let state1 = ProjectState {
            timestamp: "2023-01-01T00:00:00Z".to_string(),
            config_hash: "test_hash".to_string(),
            files: state1_files,
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 2,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        // Modify and create new state
        fs::write(base_path.join("file1.txt"), "modified_content1").unwrap();
        fs::write(base_path.join("file3.txt"), "content3").unwrap();

        let mut state2_files = BTreeMap::new();
        state2_files.insert(
            PathBuf::from("file1.txt"),
            FileState::from_path(&base_path.join("file1.txt")).unwrap(),
        );
        state2_files.insert(
            PathBuf::from("file2.txt"),
            FileState::from_path(&base_path.join("file2.txt")).unwrap(),
        );
        state2_files.insert(
            PathBuf::from("file3.txt"),
            FileState::from_path(&base_path.join("file3.txt")).unwrap(),
        );

        let state2 = ProjectState {
            timestamp: "2023-01-01T01:00:00Z".to_string(),
            config_hash: "test_hash".to_string(),
            files: state2_files,
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 3,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        let comparison = state2.compare_with(&state1);

        assert_eq!(comparison.summary.added.len(), 1);
        assert_eq!(comparison.summary.modified.len(), 1);
        assert_eq!(comparison.summary.removed.len(), 0);
        assert!(
            comparison
                .summary
                .added
                .contains(&PathBuf::from("file3.txt"))
        );
        assert!(
            comparison
                .summary
                .modified
                .contains(&PathBuf::from("file1.txt"))
        );
    }

    #[test]
    fn test_change_summary_markdown() {
        let summary = ChangeSummary {
            added: vec![PathBuf::from("new.txt")],
            removed: vec![PathBuf::from("old.txt")],
            modified: vec![PathBuf::from("changed.txt")],
            total_changes: 3,
        };

        let markdown = summary.to_markdown();

        assert!(markdown.contains("## Change Summary"));
        assert!(markdown.contains("- Added: `new.txt`"));
        assert!(markdown.contains("- Removed: `old.txt`"));
        assert!(markdown.contains("- Modified: `changed.txt`"));
    }

    #[test]
    fn test_binary_file_handling() {
        let temp_dir = tempdir().unwrap();
        let binary_file = temp_dir.path().join("test.bin");

        // Write binary data (non-UTF8)
        let binary_data = vec![0u8, 255, 128, 42, 0, 1, 2, 3];
        fs::write(&binary_file, &binary_data).unwrap();

        // Should not crash and should handle gracefully
        let file_state = FileState::from_path(&binary_file).unwrap();

        // Content should be a placeholder for binary files
        assert!(file_state.content.contains("Binary file"));
        assert!(file_state.content.contains("8 bytes"));
        assert_eq!(file_state.size, 8);
        assert!(!file_state.content_hash.is_empty());
    }

    #[test]
    fn test_has_changes_identical_states() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        fs::write(base_path.join("test.txt"), "content").unwrap();

        let mut files = BTreeMap::new();
        files.insert(
            PathBuf::from("test.txt"),
            FileState::from_path(&base_path.join("test.txt")).unwrap(),
        );

        let state1 = ProjectState {
            timestamp: "2023-01-01T00:00:00Z".to_string(),
            config_hash: "hash1".to_string(),
            files: files.clone(),
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 1,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        let state2 = ProjectState {
            timestamp: "2023-01-01T01:00:00Z".to_string(),
            config_hash: "hash1".to_string(),
            files,
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 1,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        assert!(!state1.has_changes(&state2));
    }

    #[test]
    fn test_has_changes_different_file_count() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        fs::write(base_path.join("test1.txt"), "content1").unwrap();
        fs::write(base_path.join("test2.txt"), "content2").unwrap();

        let mut files1 = BTreeMap::new();
        files1.insert(
            PathBuf::from("test1.txt"),
            FileState::from_path(&base_path.join("test1.txt")).unwrap(),
        );

        let mut files2 = BTreeMap::new();
        files2.insert(
            PathBuf::from("test1.txt"),
            FileState::from_path(&base_path.join("test1.txt")).unwrap(),
        );
        files2.insert(
            PathBuf::from("test2.txt"),
            FileState::from_path(&base_path.join("test2.txt")).unwrap(),
        );

        let state1 = ProjectState {
            timestamp: "2023-01-01T00:00:00Z".to_string(),
            config_hash: "hash1".to_string(),
            files: files1,
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 1,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        let state2 = ProjectState {
            timestamp: "2023-01-01T01:00:00Z".to_string(),
            config_hash: "hash1".to_string(),
            files: files2,
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 2,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        assert!(state1.has_changes(&state2));
    }

    #[test]
    fn test_has_changes_content_different() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();
        fs::write(base_path.join("test.txt"), "content1").unwrap();

        let file_state1 = FileState::from_path(&base_path.join("test.txt")).unwrap();

        fs::write(base_path.join("test.txt"), "content2").unwrap();
        let file_state2 = FileState::from_path(&base_path.join("test.txt")).unwrap();

        let mut files1 = BTreeMap::new();
        files1.insert(PathBuf::from("test.txt"), file_state1);

        let mut files2 = BTreeMap::new();
        files2.insert(PathBuf::from("test.txt"), file_state2);

        let state1 = ProjectState {
            timestamp: "2023-01-01T00:00:00Z".to_string(),
            config_hash: "hash1".to_string(),
            files: files1,
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 1,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        let state2 = ProjectState {
            timestamp: "2023-01-01T01:00:00Z".to_string(),
            config_hash: "hash1".to_string(),
            files: files2,
            metadata: ProjectMetadata {
                project_name: "test".to_string(),
                file_count: 1,
                filters: vec![],
                ignores: vec![],
                line_numbers: false,
            },
        };

        assert!(state1.has_changes(&state2));
    }

    #[test]
    fn test_config_hash_generation() {
        let config1 = Config {
            filter: Some(vec!["rs".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(true),
            auto_diff: Some(false),
            diff_context_lines: Some(3),
            ..Default::default()
        };

        let config2 = Config {
            filter: Some(vec!["rs".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(true),
            auto_diff: Some(false),
            diff_context_lines: Some(3),
            ..Default::default()
        };

        let config3 = Config {
            filter: Some(vec!["py".to_string()]), // Different filter
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(true),
            auto_diff: Some(false),
            diff_context_lines: Some(3),
            ..Default::default()
        };

        let hash1 = ProjectState::compute_config_hash(&config1);
        let hash2 = ProjectState::compute_config_hash(&config2);
        let hash3 = ProjectState::compute_config_hash(&config3);

        assert_eq!(hash1, hash2);
        assert_ne!(hash1, hash3);
    }

    #[test]
    fn test_change_summary_no_changes() {
        let summary = ChangeSummary {
            added: vec![],
            removed: vec![],
            modified: vec![],
            total_changes: 0,
        };

        assert!(!summary.has_changes());
        assert_eq!(summary.to_markdown(), "");
    }

    #[test]
    fn test_from_files_with_config() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        fs::write(base_path.join("test.rs"), "fn main() {}").unwrap();
        fs::write(base_path.join("README.md"), "# Test").unwrap();

        let entries = vec![
            create_mock_dir_entry(&base_path.join("test.rs")),
            create_mock_dir_entry(&base_path.join("README.md")),
        ];

        let config = Config {
            filter: Some(vec!["rs".to_string()]),
            ignore: Some(vec!["target".to_string()]),
            line_numbers: Some(true),
            ..Default::default()
        };

        let state = ProjectState::from_files(&entries, base_path, &config, true).unwrap();

        assert_eq!(state.files.len(), 2);
        assert_eq!(state.metadata.file_count, 2);
        assert_eq!(state.metadata.filters, vec!["rs"]);
        assert_eq!(state.metadata.ignores, vec!["target"]);
        assert!(state.metadata.line_numbers);
        assert!(!state.timestamp.is_empty());
        assert!(!state.config_hash.is_empty());
    }

    #[test]
    fn test_from_files_absolute_path_fallback() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        // Create a file in the temp dir
        fs::write(base_path.join("test.txt"), "test content").unwrap();
        let file_path = base_path.join("test.txt");

        // Create entry with the file
        let entry = create_mock_dir_entry(&file_path);

        // Use a completely different base_path to force the fallback
        let different_base = PathBuf::from("/completely/different/path");

        let config = Config::default();

        let state = ProjectState::from_files(&[entry], &different_base, &config, false).unwrap();

        // Should fall back to just the filename
        assert_eq!(state.files.len(), 1);
        assert!(state.files.contains_key(&PathBuf::from("test.txt")));
    }

    #[test]
    fn test_change_summary_with_unchanged_files() {
        let changes = vec![
            PerFileDiff {
                path: "added.txt".to_string(),
                status: PerFileStatus::Added,
                diff: "diff content".to_string(),
            },
            PerFileDiff {
                path: "unchanged.txt".to_string(),
                status: PerFileStatus::Unchanged,
                diff: "".to_string(),
            },
        ];

        // Manually create the summary like the actual code does
        let mut added = Vec::new();
        let mut removed = Vec::new();
        let mut modified = Vec::new();

        for diff in &changes {
            let path = PathBuf::from(&diff.path);
            match diff.status {
                PerFileStatus::Added => added.push(path),
                PerFileStatus::Removed => removed.push(path),
                PerFileStatus::Modified => modified.push(path),
                PerFileStatus::Unchanged => {} // This line should be covered now
            }
        }

        let summary = ChangeSummary {
            total_changes: added.len() + removed.len() + modified.len(),
            added,
            removed,
            modified,
        };

        assert_eq!(summary.total_changes, 1); // Only the added file counts
        assert_eq!(summary.added.len(), 1);
        assert_eq!(summary.removed.len(), 0);
        assert_eq!(summary.modified.len(), 0);
    }

    #[test]
    fn test_has_changes_with_missing_file() {
        let temp_dir = tempdir().unwrap();
        let base_path = temp_dir.path();

        // Create files for the first state
        fs::write(base_path.join("file1.txt"), "content1").unwrap();
        let entry1 = create_mock_dir_entry(&base_path.join("file1.txt"));

        let config = Config::default();
        let state1 = ProjectState::from_files(&[entry1], base_path, &config, false).unwrap();

        // Create a different state with different files
        fs::write(base_path.join("file2.txt"), "content2").unwrap();
        let entry2 = create_mock_dir_entry(&base_path.join("file2.txt"));
        let state2 = ProjectState::from_files(&[entry2], base_path, &config, false).unwrap();

        // Should detect changes because files are completely different
        assert!(state1.has_changes(&state2));
    }

    #[test]
    fn test_file_state_with_invalid_data_error() {
        // Create a temporary file with binary content that might trigger InvalidData
        let temp_dir = tempdir().unwrap();
        let binary_file = temp_dir.path().join("binary.dat");

        // Write invalid UTF-8 bytes
        let binary_data = vec![0xFF, 0xFE, 0xFD, 0xFC, 0xFB, 0xFA];
        fs::write(&binary_file, &binary_data).unwrap();

        // This might trigger the InvalidData error path, but since we can't guarantee it,
        // we at least verify the function can handle binary files
        let result = FileState::from_path(&binary_file);
        assert!(result.is_ok());
    }

    // Helper function to create a mock DirEntry for testing
    fn create_mock_dir_entry(path: &std::path::Path) -> ignore::DirEntry {
        // This is a bit of a hack since DirEntry doesn't have a public constructor
        // We use the ignore crate's WalkBuilder to create a real DirEntry
        let walker = ignore::WalkBuilder::new(path.parent().unwrap());
        walker
            .build()
            .filter_map(Result::ok)
            .find(|entry| entry.path() == path)
            .expect("Failed to create DirEntry for test")
    }
}
```

### File: `src/token_count.rs`

- Size: 9919 bytes
- Modified: 2026-02-14 07:14:48 UTC

```rust
use ignore::DirEntry;
use once_cell::sync::Lazy;
use std::collections::BTreeMap;
use std::fs;
use std::path::Path;
/// Token counting utilities for estimating LLM token usage
use tiktoken_rs::{CoreBPE, cl100k_base};

// Initialize the tokenizer once and reuse it
static TOKENIZER: Lazy<CoreBPE> = Lazy::new(|| cl100k_base().unwrap());

/// Estimates the number of tokens in a text string using a real tokenizer
pub fn estimate_tokens(text: &str) -> usize {
    TOKENIZER.encode_with_special_tokens(text).len()
}

/// Counts the tokens that would be generated for a file
pub fn count_file_tokens(base_path: &Path, entry: &DirEntry, line_numbers: bool) -> usize {
    let file_path = entry.path();
    let relative_path = file_path.strip_prefix(base_path).unwrap_or(file_path);

    // Start with tokens for the file header (path, size, modified time)
    let mut token_count = estimate_tokens(&format!(
        "\n### File: `{}`\n\n- Size: {} bytes\n- Modified: {}\n\n",
        relative_path.display(),
        entry.metadata().map(|m| m.len()).unwrap_or(0),
        "Unknown"
    )); // Using "Unknown" as placeholder for modified time in estimation

    // Add tokens for the code fences
    token_count += estimate_tokens("```\n```");

    // Try to read file content
    if let Ok(content) = fs::read_to_string(file_path) {
        if line_numbers {
            // When line numbers are enabled, we add the line number prefix to each line
            let lines_with_numbers: String = content
                .lines()
                .enumerate()
                .map(|(i, line)| format!("{:>4} | {}\n", i + 1, line))
                .collect();
            token_count += estimate_tokens(&lines_with_numbers);
        } else {
            token_count += estimate_tokens(&content);
        }
    }

    token_count
}

/// Counts the tokens that would be generated for the entire file tree section
pub fn count_tree_tokens(tree: &BTreeMap<String, crate::tree::FileNode>, depth: usize) -> usize {
    let mut token_count = 0;

    // Add tokens for indentation
    let indent = "  ".repeat(depth);

    for (name, node) in tree {
        match node {
            crate::tree::FileNode::File => {
                token_count += estimate_tokens(&format!("{}- 📄 {}\n", indent, name));
            }
            crate::tree::FileNode::Directory(children) => {
                token_count += estimate_tokens(&format!("{}- 📁 {}\n", indent, name));
                token_count += count_tree_tokens(children, depth + 1);
            }
        }
    }

    token_count
}

#[cfg(test)]
mod tests {
    use super::*;
    use std::collections::BTreeMap;

    #[test]
    fn test_estimate_tokens() {
        // Test with a simple string
        let text = "Hello, world!";
        let tokens = estimate_tokens(text);
        // "Hello, world!" is 4 tokens with cl100k_base
        assert_eq!(tokens, 4);

        // Test with code-like content
        let code_text = "fn main() {\n    println!(\"Hello, world!\");\n}";
        let tokens = estimate_tokens(code_text);
        // This specific code snippet is 12 tokens with cl100k_base
        assert_eq!(tokens, 12);
    }

    #[test]
    fn test_count_tree_tokens() {
        // Create a simple tree structure
        let mut tree = BTreeMap::new();
        tree.insert("file1.rs".to_string(), crate::tree::FileNode::File);

        let mut subdir = BTreeMap::new();
        subdir.insert("file2.md".to_string(), crate::tree::FileNode::File);
        tree.insert("src".to_string(), crate::tree::FileNode::Directory(subdir));

        let tokens = count_tree_tokens(&tree, 0);
        // "- 📄 file1.rs\n" -> 8 tokens
        // "- 📁 src\n" -> 6 tokens
        // "  - 📄 file2.md\n" -> 9 tokens
        // Total should be 23 tokens
        assert_eq!(tokens, 23);
    }

    #[test]
    fn test_token_estimation_format_consistency() {
        use tempfile::tempdir;

        let dir = tempdir().unwrap();
        let test_file = dir.path().join("test.rs");
        std::fs::write(&test_file, "fn main() {}\n").unwrap();

        let entry = ignore::WalkBuilder::new(&test_file)
            .build()
            .next()
            .unwrap()
            .unwrap();

        // Estimate tokens for the file
        let estimated_tokens = count_file_tokens(dir.path(), &entry, false);

        // Generate actual markdown content
        let mut actual_content = Vec::new();
        crate::markdown::process_file(dir.path(), &test_file, &mut actual_content, false, None)
            .unwrap();
        let actual_content_str = String::from_utf8(actual_content).unwrap();

        // Count actual tokens
        let actual_tokens = estimate_tokens(&actual_content_str);

        // The estimation should be close to actual (within a reasonable margin)
        // Allow for some variance due to timestamp differences and minor formatting
        let difference = actual_tokens.abs_diff(estimated_tokens);

        // Should be within 10% or 20 tokens difference (whichever is larger)
        let max_allowed_difference = std::cmp::max(actual_tokens / 10, 20);

        assert!(
            difference <= max_allowed_difference,
            "Token estimation {} differs too much from actual {} (difference: {})",
            estimated_tokens,
            actual_tokens,
            difference
        );
    }

    #[test]
    fn test_estimate_tokens_empty_string() {
        let tokens = estimate_tokens("");
        assert_eq!(tokens, 0);
    }

    #[test]
    fn test_estimate_tokens_whitespace_only() {
        let tokens = estimate_tokens("   \n\t  ");
        assert!(tokens > 0); // Whitespace still counts as tokens
    }

    #[test]
    fn test_estimate_tokens_unicode() {
        let tokens = estimate_tokens("Hello 世界! 🌍");
        assert!(tokens > 0);
        // Unicode characters may be encoded as multiple tokens
        assert!(tokens >= 4);
    }

    #[test]
    fn test_count_file_tokens_with_line_numbers() {
        use tempfile::tempdir;

        let dir = tempdir().unwrap();
        let test_file = dir.path().join("test.rs");
        std::fs::write(&test_file, "line 1\nline 2\nline 3").unwrap();

        let entry = ignore::WalkBuilder::new(&test_file)
            .build()
            .next()
            .unwrap()
            .unwrap();

        let tokens_without_line_numbers = count_file_tokens(dir.path(), &entry, false);
        let tokens_with_line_numbers = count_file_tokens(dir.path(), &entry, true);

        // With line numbers should have more tokens due to line number prefixes
        assert!(tokens_with_line_numbers > tokens_without_line_numbers);
    }

    #[test]
    fn test_count_file_tokens_unreadable_file() {
        use tempfile::tempdir;

        let dir = tempdir().unwrap();
        let test_file = dir.path().join("nonexistent.txt");

        // Create a mock DirEntry for a file that doesn't exist
        // This simulates what happens when a file is deleted between discovery and processing
        let walker = ignore::WalkBuilder::new(dir.path());
        let mut found_entry = None;

        // Create the file temporarily to get a DirEntry
        std::fs::write(&test_file, "temp").unwrap();
        for entry in walker.build() {
            if let Ok(entry) = entry
                && entry.path() == test_file
            {
                found_entry = Some(entry);
                break;
            }
        }

        // Now delete the file
        std::fs::remove_file(&test_file).unwrap();

        if let Some(entry) = found_entry {
            let tokens = count_file_tokens(dir.path(), &entry, false);
            // Should still return some tokens for the file header even if content can't be read
            assert!(tokens > 0);
        }
    }

    #[test]
    fn test_count_tree_tokens_empty_tree() {
        let tree = BTreeMap::new();
        let tokens = count_tree_tokens(&tree, 0);
        assert_eq!(tokens, 0);
    }

    #[test]
    fn test_count_tree_tokens_nested_directories() {
        let mut tree = BTreeMap::new();

        // Create deeply nested structure
        let mut level3 = BTreeMap::new();
        level3.insert("deep_file.txt".to_string(), crate::tree::FileNode::File);

        let mut level2 = BTreeMap::new();
        level2.insert(
            "level3".to_string(),
            crate::tree::FileNode::Directory(level3),
        );

        let mut level1 = BTreeMap::new();
        level1.insert(
            "level2".to_string(),
            crate::tree::FileNode::Directory(level2),
        );

        tree.insert(
            "level1".to_string(),
            crate::tree::FileNode::Directory(level1),
        );

        let tokens = count_tree_tokens(&tree, 0);
        assert!(tokens > 0);

        // Should account for indentation at different levels
        let tokens_with_depth = count_tree_tokens(&tree, 2);
        assert!(tokens_with_depth > tokens); // More indentation = more tokens
    }

    #[test]
    fn test_count_tree_tokens_mixed_content() {
        let mut tree = BTreeMap::new();

        // Add files with various name lengths and characters
        tree.insert("a.txt".to_string(), crate::tree::FileNode::File);
        tree.insert(
            "very_long_filename_with_underscores.rs".to_string(),
            crate::tree::FileNode::File,
        );
        tree.insert("файл.txt".to_string(), crate::tree::FileNode::File); // Unicode filename

        let mut subdir = BTreeMap::new();
        subdir.insert("nested.md".to_string(), crate::tree::FileNode::File);
        tree.insert(
            "directory".to_string(),
            crate::tree::FileNode::Directory(subdir),
        );

        let tokens = count_tree_tokens(&tree, 0);
        assert!(tokens > 0);

        // Verify it handles unicode filenames without crashing
        assert!(tokens > 20); // Should be substantial given the content
    }
}
```

### File: `src/tree.rs`

- Size: 10845 bytes
- Modified: 2026-02-14 17:55:15 UTC

```rust
use ignore::DirEntry;
use std::collections::BTreeMap;
use std::io::{self, Write};
use std::path::Path;

/// A nested map to represent the file tree structure.
#[derive(Debug, Clone, PartialEq)]
pub enum FileNode {
    File,
    Directory(BTreeMap<String, FileNode>),
}

/// Type alias for the file tree structure.
pub type FileTree = BTreeMap<String, FileNode>;

/// Builds a nested BTreeMap representing the file structure.
pub fn build_file_tree(files: &[DirEntry], base_path: &Path) -> FileTree {
    let mut tree = BTreeMap::new();
    for entry in files {
        let path = entry
            .path()
            .strip_prefix(base_path)
            .unwrap_or_else(|_| entry.path());
        let components: Vec<_> = path.components().collect();

        // Insert this path into the tree
        insert_path(&mut tree, &components);
    }
    tree
}

/// Helper function to insert a path into the tree structure
fn insert_path(tree: &mut FileTree, components: &[std::path::Component]) {
    if components.is_empty() {
        return;
    }

    let name = components[0].as_os_str().to_string_lossy().to_string();

    if components.len() == 1 {
        // This is the last component, so it's a file
        tree.insert(name, FileNode::File);
    } else {
        // This is a directory component
        // Make sure the directory exists
        tree.entry(name.clone())
            .or_insert_with(|| FileNode::Directory(BTreeMap::new()));

        // Recursively insert the rest of the path
        if let Some(FileNode::Directory(next_dir)) = tree.get_mut(&name) {
            insert_path(next_dir, &components[1..]);
        }
    }
}

/// Recursively prints the file tree to the console.
pub fn print_tree(tree: &FileTree, depth: usize) {
    for (name, node) in tree {
        let indent = "  ".repeat(depth);
        match node {
            FileNode::File => {
                println!("{}- 📄 {}", indent, name);
            }
            FileNode::Directory(children) => {
                println!("{}- 📁 {}", indent, name);
                print_tree(children, depth + 1);
            }
        }
    }
}

/// Recursively writes the file tree to a file.
pub fn write_tree_to_file(
    output: &mut impl Write,
    tree: &FileTree,
    depth: usize,
) -> io::Result<()> {
    for (name, node) in tree {
        let indent = "  ".repeat(depth);
        match node {
            FileNode::File => {
                writeln!(output, "{}- 📄 {}", indent, name)?;
            }
            FileNode::Directory(children) => {
                writeln!(output, "{}- 📁 {}", indent, name)?;
                write_tree_to_file(output, children, depth + 1)?;
            }
        }
    }
    Ok(())
}

#[cfg(test)]
mod tests {
    use super::*;
    use crate::file_utils::collect_files;
    use std::fs;
    use tempfile::tempdir;

    #[test]
    fn test_build_file_tree_with_collected_files() {
        // 1. Set up a temporary directory with a file structure
        let dir = tempdir().unwrap();
        let base_path = dir.path();

        fs::create_dir(base_path.join("src")).unwrap();
        fs::File::create(base_path.join("src/main.rs")).unwrap();
        fs::File::create(base_path.join("README.md")).unwrap();
        // Add a hidden file that should be ignored by default
        fs::File::create(base_path.join(".env")).unwrap();

        // 2. Collect files using the actual function
        let files = collect_files(base_path, &[], &[], &[]).unwrap();

        // 3. Assert that the correct files were collected (a hidden file is ignored)
        assert_eq!(files.len(), 2);

        // 4. Build the tree with the collected files
        let tree = build_file_tree(&files, base_path);

        // 5. Assert the tree structure is correct
        let mut expected: FileTree = BTreeMap::new();
        let mut src_tree = BTreeMap::new();
        src_tree.insert("main.rs".to_string(), FileNode::File);
        expected.insert("src".to_string(), FileNode::Directory(src_tree));
        expected.insert("README.md".to_string(), FileNode::File);

        assert_eq!(tree, expected);
    }

    #[test]
    fn test_build_file_tree_empty() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();

        let files = collect_files(base_path, &[], &[], &[]).unwrap();
        let tree = build_file_tree(&files, base_path);

        assert!(tree.is_empty());
    }

    #[test]
    fn test_build_file_tree_single_file() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();

        fs::File::create(base_path.join("single.txt")).unwrap();

        let files = collect_files(base_path, &[], &[], &[]).unwrap();
        let tree = build_file_tree(&files, base_path);

        let mut expected: FileTree = BTreeMap::new();
        expected.insert("single.txt".to_string(), FileNode::File);

        assert_eq!(tree, expected);
    }

    #[test]
    fn test_build_file_tree_nested_directories() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();

        fs::create_dir_all(base_path.join("a/b/c")).unwrap();
        fs::File::create(base_path.join("a/b/c/deep.txt")).unwrap();
        fs::File::create(base_path.join("a/shallow.txt")).unwrap();

        let files = collect_files(base_path, &[], &[], &[]).unwrap();
        let tree = build_file_tree(&files, base_path);

        // Build expected structure
        let mut c_tree = BTreeMap::new();
        c_tree.insert("deep.txt".to_string(), FileNode::File);

        let mut b_tree = BTreeMap::new();
        b_tree.insert("c".to_string(), FileNode::Directory(c_tree));

        let mut a_tree = BTreeMap::new();
        a_tree.insert("b".to_string(), FileNode::Directory(b_tree));
        a_tree.insert("shallow.txt".to_string(), FileNode::File);

        let mut expected: FileTree = BTreeMap::new();
        expected.insert("a".to_string(), FileNode::Directory(a_tree));

        assert_eq!(tree, expected);
    }

    #[test]
    fn test_build_file_tree_unicode_filenames() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();

        fs::create_dir(base_path.join("测试目录")).unwrap();
        fs::File::create(base_path.join("测试目录/文件.txt")).unwrap();
        fs::File::create(base_path.join("🦀.rs")).unwrap();

        let files = collect_files(base_path, &[], &[], &[]).unwrap();
        let tree = build_file_tree(&files, base_path);

        let mut test_dir = BTreeMap::new();
        test_dir.insert("文件.txt".to_string(), FileNode::File);

        let mut expected: FileTree = BTreeMap::new();
        expected.insert("测试目录".to_string(), FileNode::Directory(test_dir));
        expected.insert("🦀.rs".to_string(), FileNode::File);

        assert_eq!(tree, expected);
    }

    #[test]
    fn test_insert_path_empty_components() {
        let mut tree = BTreeMap::new();
        insert_path(&mut tree, &[]);
        assert!(tree.is_empty());
    }

    #[test]
    fn test_write_tree_to_file() {
        let mut tree = BTreeMap::new();
        tree.insert("file1.txt".to_string(), FileNode::File);

        let mut subdir = BTreeMap::new();
        subdir.insert("file2.md".to_string(), FileNode::File);
        tree.insert("src".to_string(), FileNode::Directory(subdir));

        let mut output = Vec::new();
        write_tree_to_file(&mut output, &tree, 0).unwrap();

        let result = String::from_utf8(output).unwrap();
        assert!(result.contains("- 📄 file1.txt"));
        assert!(result.contains("- 📁 src"));
        assert!(result.contains("  - 📄 file2.md"));
    }

    #[test]
    fn test_write_tree_to_file_with_depth() {
        let mut tree = BTreeMap::new();
        tree.insert("nested.txt".to_string(), FileNode::File);

        let mut output = Vec::new();
        write_tree_to_file(&mut output, &tree, 2).unwrap();

        let result = String::from_utf8(output).unwrap();
        assert!(result.contains("    - 📄 nested.txt")); // 2 levels of indentation
    }

    #[test]
    fn test_write_tree_to_file_empty_tree() {
        let tree = BTreeMap::new();
        let mut output = Vec::new();
        write_tree_to_file(&mut output, &tree, 0).unwrap();

        let result = String::from_utf8(output).unwrap();
        assert!(result.is_empty());
    }

    #[test]
    fn test_file_node_equality() {
        let file1 = FileNode::File;
        let file2 = FileNode::File;
        assert_eq!(file1, file2);

        let mut dir1 = BTreeMap::new();
        dir1.insert("test.txt".to_string(), FileNode::File);
        let node1 = FileNode::Directory(dir1.clone());
        let node2 = FileNode::Directory(dir1);
        assert_eq!(node1, node2);

        // Different directories should not be equal
        let mut dir2 = BTreeMap::new();
        dir2.insert("other.txt".to_string(), FileNode::File);
        let node3 = FileNode::Directory(dir2);
        assert_ne!(node1, node3);

        // File and directory should not be equal
        assert_ne!(file1, node1);
    }

    #[test]
    fn test_build_file_tree_absolute_path_fallback() {
        // Test the fallback case when strip_prefix fails by using different base paths
        let dir = tempdir().unwrap();
        let base_path = dir.path();
        let other_dir = tempdir().unwrap();
        let other_base = other_dir.path();

        // Create a file in the first directory
        fs::File::create(base_path.join("test.txt")).unwrap();

        // Create a DirEntry from the first directory but use a different base_path
        let files = collect_files(base_path, &[], &[], &[]).unwrap();

        // This should trigger the unwrap_or_else case since other_base is unrelated to the file path
        let tree = build_file_tree(&files, other_base);

        // The tree should still contain the file, but with its full path
        assert!(!tree.is_empty());
    }

    #[test]
    fn test_build_file_tree_multiple_files_same_directory() {
        let dir = tempdir().unwrap();
        let base_path = dir.path();

        fs::create_dir(base_path.join("docs")).unwrap();
        fs::File::create(base_path.join("docs/readme.md")).unwrap();
        fs::File::create(base_path.join("docs/guide.md")).unwrap();
        fs::File::create(base_path.join("docs/api.md")).unwrap();

        let files = collect_files(base_path, &[], &[], &[]).unwrap();
        let tree = build_file_tree(&files, base_path);

        let mut docs_tree = BTreeMap::new();
        docs_tree.insert("api.md".to_string(), FileNode::File);
        docs_tree.insert("guide.md".to_string(), FileNode::File);
        docs_tree.insert("readme.md".to_string(), FileNode::File);

        let mut expected: FileTree = BTreeMap::new();
        expected.insert("docs".to_string(), FileNode::Directory(docs_tree));

        assert_eq!(tree, expected);
    }
}
```

### File: `tarpaulin.toml`

- Size: 304 bytes
- Modified: 2026-02-14 07:14:48 UTC

```toml
[test_config]
name = "Context Builder"
manifest-path = "./Cargo.toml"
skip-clean = true
all-features = false
exclude-files = [
        "samples/*",
        "benches/*",
        "tests/*",
        "scripts/*",
        "src/main.rs"
    ]
no-fail-fast = true
color = "Auto"

[report]
out = ["Html", "Xml"]
```

### File: `benches/context_bench.rs`

- Size: 10825 bytes
- Modified: 2026-02-14 22:42:36 UTC

```rust
use std::fs;
use std::path::{Path, PathBuf};
use std::sync::Once;
use std::time::Duration;

use criterion::{BenchmarkId, Criterion, criterion_group, criterion_main};
use tempfile::tempdir;

use context_builder::cli::Args;
use context_builder::config::Config;
use context_builder::{Prompter, run_with_args};

static INIT: Once = Once::new();

fn init_bench_env() {
    INIT.call_once(|| {
        // Note: set_var now requires unsafe block from Rust 2024 onwards
        unsafe {
            std::env::set_var("CB_SILENT", "1");
        }
    });
}

/// Prompter that always auto-confirms. Used to avoid interactive pauses during benchmarks.
struct NoPrompt;

impl Prompter for NoPrompt {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(true)
    }
    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(true)
    }
}

/// Specification for generating a synthetic dataset for benchmarking.
#[derive(Clone)]
struct DatasetSpec {
    /// Human-friendly name used in the benchmark ID.
    name: &'static str,
    /// Approximate number of text files to generate.
    text_files: usize,
    /// Generate one binary file every `binary_every` text files (0 disables binary generation).
    binary_every: usize,
    /// Directory tree depth.
    depth: usize,
    /// Number of subdirectories per directory level.
    width: usize,
    /// Size of each text file (in bytes).
    text_file_size: usize,
    /// File extensions to include in benchmark (others should be ignored).
    filters: Vec<String>,
    /// Directory/file names to ignore (by component name).
    ignores: Vec<String>,
}

fn write_text_file(path: &Path, bytes: usize) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    let mut content = String::with_capacity(bytes);
    // Generate deterministic content consisting of multiple lines
    // Approx 40 bytes per line -> repeat to reach desired size
    let line = "let x = 42; // benchmark content line\n";
    while content.len() < bytes {
        content.push_str(line);
    }
    // Trim to exact size
    content.truncate(bytes);
    // Ensure trailing newline for line-numbering path
    if !content.ends_with('\n') {
        content.push('\n');
    }
    fs::write(path, content).unwrap();
}

fn write_binary_file(path: &Path, bytes: usize) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    let mut data = Vec::with_capacity(bytes);
    // Simple reproducible byte pattern
    for i in 0..bytes {
        data.push(((i as u8).wrapping_mul(31)).wrapping_add(7));
    }
    fs::write(path, data).unwrap();
}

/// Generate a synthetic project directory structure under `root`, returning the input directory path.
fn generate_dataset(root: &Path, spec: &DatasetSpec) -> PathBuf {
    let input_dir = root.join("project");
    let src_dir = input_dir.join("src");
    let docs_dir = input_dir.join("docs");
    let assets_dir = input_dir.join("assets");
    let ignored_target = input_dir.join("target"); // will be ignored if configured
    let ignored_node_modules = input_dir.join("node_modules"); // will be ignored if configured

    fs::create_dir_all(&src_dir).unwrap();
    fs::create_dir_all(&docs_dir).unwrap();
    fs::create_dir_all(&assets_dir).unwrap();
    fs::create_dir_all(&ignored_target).unwrap();
    fs::create_dir_all(&ignored_node_modules).unwrap();

    // Generate nested directories
    fn make_nested_dirs(base: &Path, depth: usize, width: usize) -> Vec<PathBuf> {
        let mut dirs = vec![base.to_path_buf()];
        for d in 1..=depth {
            let mut next_level = Vec::new();
            for parent in &dirs {
                for w in 0..width {
                    let child = parent.join(format!("d{}_{}", d, w));
                    fs::create_dir_all(&child).unwrap();
                    next_level.push(child);
                }
            }
            dirs.extend(next_level);
        }
        dirs
    }

    let all_dirs = {
        let mut v = Vec::new();
        v.extend(make_nested_dirs(&src_dir, spec.depth, spec.width));
        v.extend(make_nested_dirs(&docs_dir, spec.depth, spec.width));
        v.extend(make_nested_dirs(&assets_dir, spec.depth, spec.width));
        v
    };

    // Extensions to distribute across text files
    let text_exts = ["rs", "md", "txt", "toml"];

    // Create text files distributed across dirs
    let mut created = 0usize;
    let mut bin_counter = 0usize;

    'outer: for dir in &all_dirs {
        for i in 0..spec.width.max(1) {
            if created >= spec.text_files {
                break 'outer;
            }
            // Round-robin extensions
            let ext = text_exts[created % text_exts.len()];
            let path = dir.join(format!("f{}_{}.{}", created, i, ext));
            write_text_file(&path, spec.text_file_size);
            created += 1;

            if spec.binary_every > 0 {
                bin_counter += 1;
                if bin_counter.is_multiple_of(spec.binary_every) {
                    let bpath = dir.join(format!("bin_{}_{}.bin", created, i));
                    write_binary_file(&bpath, 2048);
                }
            }
        }
    }

    // Populate ignored directories with content that should not be processed
    write_text_file(&ignored_target.join("ignored.rs"), spec.text_file_size);
    write_text_file(
        &ignored_node_modules.join("ignored.js"),
        spec.text_file_size,
    );

    // Add some top-level files
    write_text_file(&input_dir.join("README.md"), spec.text_file_size);
    write_text_file(&input_dir.join("Cargo.toml"), spec.text_file_size);

    input_dir
}

/// Run a single benchmark scenario for a given dataset and line-numbering mode.
fn bench_scenario(c: &mut Criterion, spec: DatasetSpec, line_numbers: bool) {
    let tmp = tempdir().unwrap();
    let root = tmp.path();

    // Prefer local ./samples/<dataset>/project if it exists, else use CB_BENCH_DATASET_DIR, else generate temp dataset
    let samples_default = PathBuf::from("samples").join(spec.name).join("project");
    let input_dir = if samples_default.exists() {
        samples_default
    } else if let Some(dir) = std::env::var_os("CB_BENCH_DATASET_DIR") {
        let path = PathBuf::from(dir).join(spec.name).join("project");

        if !path.exists() {
            panic!(
                "CB_BENCH_DATASET_DIR is set but dataset not found at {}",
                path.display()
            );
        }

        path
    } else {
        generate_dataset(root, &spec)
    };

    let output_path = root.join(format!(
        "output_{}_{}.md",
        spec.name,
        if line_numbers { "ln" } else { "raw" }
    ));

    let args = Args {
        input: input_dir.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: spec.filters.clone(),
        ignore: spec.ignores.clone(),
        preview: false,
        token_count: false,
        line_numbers,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = NoPrompt;

    let mut group = c.benchmark_group("context_builder");

    group.measurement_time(Duration::from_secs(20));
    group.sample_size(20);

    let mode = if cfg!(feature = "parallel") {
        "parallel"
    } else {
        "sequential"
    };
    let ln = if line_numbers {
        "line_numbers"
    } else {
        "no_line_numbers"
    };
    let id = BenchmarkId::new(
        format!(
            "{}-{}files-{}B",
            spec.name, spec.text_files, spec.text_file_size
        ),
        format!("{}-{}", ln, mode),
    );

    group.bench_with_input(id, &args, |b, _| {
        b.iter(|| {
            // Allow repeated overwrites; keep the output path stable to avoid filesystem churn
            let _ = std::hint::black_box(run_with_args(
                Args {
                    input: args.input.clone(),
                    output: args.output.clone(),
                    filter: args.filter.clone(),
                    ignore: args.ignore.clone(),
                    preview: args.preview,
                    token_count: args.token_count,
                    line_numbers: args.line_numbers,
                    yes: true,
                    diff_only: false,
                    clear_cache: false,
                    init: false,
                    max_tokens: None,
                },
                Config::default(),
                &prompter,
            ));
        });
    });

    group.finish();
}

/// Benchmarks:
/// - tiny: ~100 files, small size
/// - small: ~1,000 files
/// - medium: ~5,000 files (enabled only if CB_BENCH_MEDIUM=1)
///
/// These datasets are generated in a temporary directory at runtime to keep the
/// benchmark self-contained. Binary files are generated but filtered out by
/// the `filters` configuration so they aren't processed.
///
/// Run:
///   cargo bench --bench context_bench
pub fn context_benchmark(c: &mut Criterion) {
    // Ensure silent-by-default for benchmarks
    init_bench_env();

    // Common filters and ignores: ignore typical heavy dirs; only include text code/docs
    let common_filters = vec!["rs".into(), "md".into(), "txt".into(), "toml".into()];
    let common_ignores = vec!["target".into(), "node_modules".into()];

    // Tiny dataset
    let tiny = DatasetSpec {
        name: "tiny",
        text_files: 100,
        binary_every: 10,
        depth: 2,
        width: 3,
        text_file_size: 256,
        filters: common_filters.clone(),
        ignores: common_ignores.clone(),
    };

    // Small dataset
    let small = DatasetSpec {
        name: "small",
        text_files: 1_000,
        binary_every: 20,
        depth: 3,
        width: 4,
        text_file_size: 512,
        filters: common_filters.clone(),
        ignores: common_ignores.clone(),
    };

    // Medium dataset (can be enabled via env var to avoid heavy runs by default)
    let include_medium = std::env::var("CB_BENCH_MEDIUM").ok().as_deref() == Some("1");
    let medium = DatasetSpec {
        name: "medium",
        text_files: 5_000,
        binary_every: 25,
        depth: 4,
        width: 4,
        text_file_size: 800,
        filters: common_filters.clone(),
        ignores: common_ignores.clone(),
    };

    // For each dataset, run benchmarks with and without line numbers
    for ds in [tiny, small] {
        bench_scenario(c, ds.clone(), false);
        bench_scenario(c, ds, true);
    }

    if include_medium {
        bench_scenario(c, medium.clone(), false);
        bench_scenario(c, medium, true);
    }
}

criterion_group!(benches, context_benchmark);
criterion_main!(benches);
```

### File: `tests/cli_integration.rs`

- Size: 12938 bytes
- Modified: 2026-02-14 19:55:07 UTC

```rust
use std::cell::Cell;
use std::fs;
use std::path::Path;

use tempfile::tempdir;

use context_builder::config::Config;
use context_builder::{Prompter, cli::Args, run_with_args};

struct TestPrompter {
    overwrite_response: bool,
    processing_response: bool,
    last_processing_count: Cell<usize>,
}

impl TestPrompter {
    fn new(overwrite_response: bool, processing_response: bool) -> Self {
        Self {
            overwrite_response,
            processing_response,
            last_processing_count: Cell::new(0),
        }
    }

    fn last_count(&self) -> usize {
        self.last_processing_count.get()
    }
}

impl Prompter for TestPrompter {
    fn confirm_processing(&self, file_count: usize) -> std::io::Result<bool> {
        self.last_processing_count.set(file_count);
        Ok(self.processing_response)
    }

    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(self.overwrite_response)
    }
}

fn write_file(path: &Path, contents: &str) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, contents).unwrap();
}

#[test]
fn preview_mode_does_not_create_output_file() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Create a small project structure
    write_file(&root.join("src/main.rs"), "fn main() { println!(\"hi\"); }");
    write_file(&root.join("README.md"), "# Readme");

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: root.join("output.md").to_string_lossy().into_owned(),
        filter: vec![],
        ignore: vec![],
        preview: true,
        token_count: false,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);

    // Run in preview mode
    let res = run_with_args(args, Config::default(), &prompter);
    assert!(res.is_ok(), "preview mode should succeed");

    // No output file created
    assert!(
        !root.join("output.md").exists(),
        "output file should not be created in preview mode"
    );
}

#[test]
fn preview_mode_skips_overwrite_confirmation() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Create an existing output file
    let output_path = root.join("output.md");
    write_file(&output_path, "existing content");

    // Create a small project structure
    write_file(&root.join("src/main.rs"), "fn main() { println!(\"hi\"); }");
    write_file(&root.join("README.md"), "# Readme");

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: vec![],
        ignore: vec![],
        preview: true,
        token_count: false,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    // Use false for overwrite response to verify it's not called
    let prompter = TestPrompter::new(false, true);

    // Run in preview mode - should succeed even with overwrite denied
    let res = run_with_args(args, Config::default(), &prompter);
    assert!(
        res.is_ok(),
        "preview mode should succeed without overwrite confirmation"
    );

    // Output file should remain unchanged
    let content = fs::read_to_string(&output_path).unwrap();
    assert_eq!(
        content, "existing content",
        "output file should not be modified in preview mode"
    );
}

#[test]
fn token_count_mode_skips_overwrite_confirmation() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Create an existing output file
    let output_path = root.join("output.md");
    write_file(&output_path, "existing content");

    // Create a small project structure
    write_file(&root.join("src/main.rs"), "fn main() { println!(\"hi\"); }");
    write_file(&root.join("README.md"), "# Readme");

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: true,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    // Use false for overwrite response to verify it's not called
    let prompter = TestPrompter::new(false, true);

    // Run in token count mode - should succeed even with overwrite denied
    let res = run_with_args(args, Config::default(), &prompter);
    assert!(
        res.is_ok(),
        "token count mode should succeed without overwrite confirmation"
    );

    // Output file should remain unchanged
    let content = fs::read_to_string(&output_path).unwrap();
    assert_eq!(
        content, "existing content",
        "output file should not be modified in token count mode"
    );
}

#[test]

fn both_preview_and_token_count_modes_work_together() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Create a small project structure
    write_file(&root.join("src/main.rs"), "fn main() { println!(\"hi\"); }");
    write_file(&root.join("README.md"), "# Readme");

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: root.join("output.md").to_string_lossy().into_owned(),
        filter: vec![],
        ignore: vec![],
        preview: true,
        token_count: true,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(false, true); // false for overwrite since it should be skipped

    // Run with both modes
    let res = run_with_args(args, Config::default(), &prompter);
    assert!(res.is_ok(), "both modes should work together");

    // No output file created
    assert!(
        !root.join("output.md").exists(),
        "output file should not be created when both modes are active"
    );
}

#[test]
fn end_to_end_generates_output_with_filters_ignores_and_line_numbers() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Files that should be included by filters
    write_file(
        &root.join("src/main.rs"),
        "fn main() {\n    println!(\"hi\");\n}\n",
    );
    write_file(&root.join("README.md"), "# Top-level readme\n\nSome text");

    // Ignored directories/files
    write_file(
        &root.join("node_modules/pkg/index.js"),
        "console.log('ignore');",
    );
    write_file(&root.join("target/artifact.txt"), "binary");

    // A large file to exercise streaming and performance
    let mut large = String::with_capacity(4000 * 25);
    for i in 0..4000 {
        large.push_str(&format!("// line {}\n", i + 1));
    }
    write_file(&root.join("src/large.rs"), &large);

    let output_path = root.join("ctx.md");

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: vec!["rs".into(), "md".into()],
        ignore: vec!["node_modules".into(), "target".into()],
        preview: false,
        token_count: false,
        line_numbers: true,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    // Always proceed without interactive prompts
    let prompter = TestPrompter::new(true, true);

    let res = run_with_args(args, Config::default(), &prompter);
    assert!(res.is_ok(), "end-to-end generation should succeed");

    // Find the actual output file (may have timestamp appended)
    let actual_output_path = if output_path.exists() {
        output_path
    } else {
        // Look for timestamped version
        let parent = output_path.parent().unwrap();
        let stem = output_path.file_stem().unwrap().to_string_lossy();
        let ext = output_path.extension().unwrap().to_string_lossy();

        let mut found_file = None;
        if let Ok(entries) = fs::read_dir(parent) {
            for entry in entries.flatten() {
                let file_name = entry.file_name();
                let name = file_name.to_string_lossy();
                if name.starts_with(&format!("{}_", stem)) && name.ends_with(&format!(".{}", ext)) {
                    found_file = Some(entry.path());
                    break;
                }
            }
        }

        found_file.unwrap_or_else(|| {
            panic!(
                "No output file found. Expected {} or timestamped version",
                output_path.display()
            )
        })
    };

    // Basic content checks
    let out = fs::read_to_string(&actual_output_path).unwrap();

    // Has file tree section
    assert!(
        out.contains("## File Tree Structure"),
        "output should contain a 'File Tree Structure' section"
    );

    // Has at least one rust code block with line numbers (looking for ' | ' marker)
    assert!(
        out.contains("```rust"),
        "output should contain a rust code block"
    );
    assert!(
        out.contains("   1 | "),
        "output should contain line-numbered code blocks"
    );

    // Should not include ignored directory entries' content (not a strict check, but indicative)
    assert!(
        !out.contains("console.log('ignore');"),
        "output should not include content from ignored directories"
    );
}

#[test]
fn overwrite_prompt_is_respected() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Prepare an existing output file with sentinel content
    let output_path = root.join("out.md");
    write_file(&output_path, "SENTINEL");

    // Put a file to process
    write_file(&root.join("src/lib.rs"), "pub fn f() {}");

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: vec!["rs".into()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    // Deny overwrite
    let prompter = TestPrompter::new(false, true);

    let res = run_with_args(args, Config::default(), &prompter);
    assert!(
        res.is_err(),
        "run should return error when overwrite denied"
    );

    // Ensure file is unchanged
    let out = fs::read_to_string(&output_path).unwrap();
    assert_eq!(out, "SENTINEL", "existing output should not be overwritten");
}

#[test]
fn confirm_processing_receives_large_count() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Create a lot of files (should be well over the 100 threshold)
    fs::create_dir_all(root.join("data")).unwrap();
    for i in 0..150 {
        write_file(&root.join("data").join(format!("f{}.txt", i)), "x");
    }

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: root.join("out.md").to_string_lossy().into_owned(),
        filter: vec!["txt".into()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);

    let res = run_with_args(args, Config::default(), &prompter);
    assert!(res.is_ok(), "run should succeed with many files");

    // Ensure our injected prompter saw the large count (>= 150)
    assert!(
        prompter.last_count() >= 150,
        "expected confirm_processing to be called with >=150 files, got {}",
        prompter.last_count()
    );
}

#[test]
fn token_count_mode_does_not_create_output_file() {
    let dir = tempdir().unwrap();
    let root = dir.path();

    // Create a small project structure
    write_file(&root.join("src/main.rs"), "fn main() { println!(\"hi\"); }");
    write_file(&root.join("README.md"), "# Readme");

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: root.join("output.md").to_string_lossy().into_owned(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: true,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);

    // Run in token count mode
    let res = run_with_args(args, Config::default(), &prompter);
    assert!(res.is_ok(), "token count mode should succeed");

    // No output file created
    assert!(
        !root.join("output.md").exists(),
        "output file should not be created in token count mode"
    );
}
```

### File: `tests/diff_integration.rs`

- Size: 1122 bytes
- Modified: 2026-02-14 19:55:07 UTC

```rust
use context_builder::diff::generate_diff;

#[test]
fn test_diff_with_identical_content() {
    let content = r#"# Test Document

This is a test document with some content.

## Section 1

Some text here.

## Section 2

More text here.
"#;

    let diff = generate_diff(content, content);

    // When content is identical, diff should be empty
    assert!(diff.is_empty());
}

#[test]
fn test_diff_with_changes() {
    let old_content = r#"# Test Document

This is a test document with some content.

## Section 1

Some text here.

## Section 2

More text here.
"#;

    let new_content = r#"# Test Document

This is a test document with some content.

## Section 1

Some different text here.

## Section 2

More text here.
"#;

    let diff = generate_diff(old_content, new_content);

    // When content has differences, diff should not be empty
    assert!(!diff.is_empty());
    assert!(diff.contains("## File Differences"));

    // Print the diff for debugging
    println!("Actual diff output:\n{}", diff);

    assert!(diff.contains("- Some text here"));
    assert!(diff.contains("+ Some different text here"));
}
```

### File: `tests/test_auto_diff.rs`

- Size: 33524 bytes
- Modified: 2026-02-14 19:56:56 UTC

```rust
//! Integration tests for auto-diff functionality
//!
//! These tests verify that the auto-diff feature works correctly and robustly:
//! - Cache management and collision prevention
//! - Diff generation accuracy
//! - Configuration changes affecting cache
//! - Error recovery from corrupted cache

use pretty_assertions::assert_eq;
use serial_test::serial;
use std::fs;
use std::path::Path;
use tempfile::tempdir;

use chrono::Utc;
use context_builder::cli::Args;
use context_builder::config::{Config, load_config};
use context_builder::{Prompter, run_with_args};

/// Test prompter that always confirms
struct TestPrompter;

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(true)
    }
    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(true)
    }
}

fn create_simple_project(base_dir: &Path) -> std::io::Result<()> {
    let src_dir = base_dir.join("src");
    fs::create_dir_all(&src_dir)?;

    fs::write(
        src_dir.join("main.rs"),
        "fn main() {\n    println!(\"Hello, world!\");\n}",
    )?;
    fs::write(
        src_dir.join("lib.rs"),
        "pub fn add(a: i32, b: i32) -> i32 {\n    a + b\n}",
    )?;
    fs::write(
        base_dir.join("README.md"),
        "# Test Project\n\nThis is a test project for auto-diff.",
    )?;

    // Create config file to enable auto-diff
    fs::write(
        base_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = true
"#,
    )?;

    Ok(())
}

#[test]
#[serial]
fn test_auto_diff_workflow_basic() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_simple_project(&project_dir).unwrap();

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(), // Use current directory
        output: output_dir.join("context.md").to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };
    let prompter = TestPrompter;

    // First run - should create initial output without diffs
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut first_args = args.clone();

    // Apply line_numbers from config (matches run_with_args behavior)
    if let Some(line_numbers) = config.line_numbers {
        first_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if let Some(diff_only) = config.diff_only {
        first_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&first_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            first_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            first_args.output = new_filename;
        }
    }

    run_with_args(first_args, config.clone(), &prompter).unwrap();

    // Check that output was created
    let first_output = fs::read_dir(&output_dir)
        .unwrap()
        .next()
        .unwrap()
        .unwrap()
        .path();
    let first_content = fs::read_to_string(&first_output).unwrap();

    // Should not contain change summary on first run
    assert!(!first_content.contains("## Change Summary"));
    assert!(!first_content.contains("## File Differences"));

    // Modify a file
    fs::write(
        project_dir.join("src").join("main.rs"),
        "fn main() {\n    println!(\"Hello, Rust!\");\n    println!(\"Modified!\");\n}",
    )
    .unwrap();

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Second run - should detect changes
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut second_args = args;

    // Apply line_numbers from config (matches run_with_args behavior)
    if let Some(line_numbers) = config.line_numbers {
        second_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if let Some(diff_only) = config.diff_only {
        second_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&second_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            second_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            second_args.output = new_filename;
        }
    }

    run_with_args(second_args, config, &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    // Find the second output file (should have different timestamp)
    let outputs: Vec<_> = fs::read_dir(&output_dir)
        .unwrap()
        .map(|e| e.unwrap().path())
        .collect();
    assert_eq!(outputs.len(), 2, "Should have two output files");

    let second_output = outputs.iter().find(|&p| p != &first_output).unwrap();
    let second_content = fs::read_to_string(second_output).unwrap();

    // Should contain change summary
    assert!(second_content.contains("## Change Summary"));
    // Handle both Windows and Unix path separators
    assert!(
        second_content.contains("- Modified: `src/main.rs`")
            || second_content.contains("- Modified: `src\\main.rs`")
    );

    // Should contain file differences
    assert!(second_content.contains("## File Differences"));
    assert!(
        second_content.contains("### Diff: `src/main.rs`")
            || second_content.contains("### Diff: `src\\main.rs`")
    );
    assert!(second_content.contains("Hello, world!"));
    assert!(second_content.contains("Hello, Rust!"));
    assert!(second_content.contains("Modified!"));
}

#[test]
#[serial]
fn test_auto_diff_added_and_removed_files() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_simple_project(&project_dir).unwrap();

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(), // Use current directory
        output: output_dir.join("context.md").to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;

    // First run
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut first_args = args.clone();

    // Apply line_numbers from config
    if !first_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        first_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !first_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        first_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&first_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            first_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            first_args.output = new_filename;
        }
    }

    run_with_args(first_args, config.clone(), &prompter).unwrap();

    // Add a new file and remove an existing one
    fs::write(
        project_dir.join("src").join("new_module.rs"),
        "pub fn new_function() -> String {\n    \"new\".to_string()\n}",
    )
    .unwrap();

    fs::remove_file(project_dir.join("src").join("lib.rs")).unwrap();

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Second run
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut second_args = args;

    // Apply line_numbers from config
    if !second_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        second_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !second_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        second_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&second_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            second_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            second_args.output = new_filename;
        }
    }

    run_with_args(second_args, config, &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    let outputs: Vec<_> = fs::read_dir(&output_dir)
        .unwrap()
        .map(|e| e.unwrap().path())
        .collect();
    let latest_output = outputs
        .iter()
        .max_by_key(|p| fs::metadata(p).unwrap().modified().unwrap())
        .unwrap();
    let content = fs::read_to_string(latest_output).unwrap();

    // Should show both added and removed files
    // Handle both Windows and Unix path separators
    assert!(
        content.contains("- Added: `src/new_module.rs`")
            || content.contains("- Added: `src\\new_module.rs`")
    );
    // Handle both Windows and Unix path separators
    assert!(
        content.contains("- Removed: `src/lib.rs`") || content.contains("- Removed: `src\\lib.rs`")
    );

    // Added files should be marked in the files section
    assert!(content.contains("_Status: Added_"));
}

#[test]
#[serial]
fn test_diff_only_mode() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_simple_project(&project_dir).unwrap();

    // Update config to enable diff_only
    fs::write(
        project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = true
diff_only = true
"#,
    )
    .unwrap();

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(), // Use current directory
        output: output_dir.join("context.md").to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false, // Config file should override this
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;

    // First run
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut first_args = args.clone();

    // Apply line_numbers from config
    if !first_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        first_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !first_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        first_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&first_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            first_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            first_args.output = new_filename;
        }
    }

    run_with_args(first_args, config.clone(), &prompter).unwrap();

    // Modify a file
    fs::write(
        project_dir.join("src").join("main.rs"),
        "fn main() {\n    println!(\"Changed!\");\n}",
    )
    .unwrap();

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Second run
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut second_args = args;

    // Apply line_numbers from config
    if !second_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        second_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !second_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        second_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&second_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            second_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            second_args.output = new_filename;
        }
    }

    run_with_args(second_args, config, &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    let outputs: Vec<_> = fs::read_dir(&output_dir)
        .unwrap()
        .map(|e| e.unwrap().path())
        .collect();
    let latest_output = outputs
        .iter()
        .max_by_key(|p| fs::metadata(p).unwrap().modified().unwrap())
        .unwrap();
    let content = fs::read_to_string(latest_output).unwrap();

    // Should have change summary and diffs
    assert!(content.contains("## Change Summary"));
    assert!(content.contains("## File Differences"));

    // Should NOT have full file bodies section
    assert!(!content.contains("## Files"));

    // But should still have the file tree and header
    assert!(content.contains("## File Tree Structure"));
    assert!(content.contains("# Directory Structure Report"));
}

#[test]
#[serial]
fn test_cache_invalidation_on_config_change() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_simple_project(&project_dir).unwrap();

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args_base = Args {
        input: ".".to_string(), // Use current directory
        output: output_dir.join("context.md").to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;

    // First run with original config
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut first_args = args_base.clone();

    // Apply line_numbers from config
    if !first_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        first_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !first_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        first_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&first_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            first_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            first_args.output = new_filename;
        }
    }

    run_with_args(first_args, config, &prompter).unwrap();

    // Change configuration - add line numbers
    fs::write(
        project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = true
line_numbers = true
"#,
    )
    .unwrap();

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Second run with new config should not show diffs (cache should be invalidated)
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut second_args = args_base;

    // Apply line_numbers from config (matches run_with_args behavior)
    if let Some(line_numbers) = config.line_numbers {
        second_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if let Some(diff_only) = config.diff_only {
        second_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&second_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            second_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            second_args.output = new_filename;
        }
    }

    run_with_args(second_args, config, &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    let outputs: Vec<_> = fs::read_dir(&output_dir)
        .unwrap()
        .map(|e| e.unwrap().path())
        .collect();
    let latest_output = outputs
        .iter()
        .max_by_key(|p| fs::metadata(p).unwrap().modified().unwrap())
        .unwrap();
    let content = fs::read_to_string(latest_output).unwrap();

    // Should have line numbers (showing new config is active)
    assert!(content.contains("   1 |"));

    // Should not show change summary since cache was invalidated
    assert!(!content.contains("## Change Summary"));
}

#[test]
#[serial]
fn test_concurrent_cache_access() {
    use std::sync::Arc;
    use std::thread;

    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_simple_project(&project_dir).unwrap();

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    let project_dir = Arc::new(project_dir);
    let output_dir = Arc::new(output_dir);

    // Spawn multiple threads that try to run the tool concurrently
    let handles: Vec<_> = (0..3)
        .map(|i| {
            let project_dir = Arc::clone(&project_dir);
            let output_dir = Arc::clone(&output_dir);

            thread::spawn(move || {
                let args = Args {
                    input: project_dir.to_string_lossy().to_string(),
                    output: output_dir
                        .join(format!("context_{}.md", i))
                        .to_string_lossy()
                        .to_string(),
                    filter: vec![],
                    ignore: vec![],
                    preview: false,
                    token_count: false,
                    line_numbers: false,
                    yes: true,
                    diff_only: false,
                    clear_cache: false,
                    init: false,
                    max_tokens: None,
                };

                let prompter = TestPrompter;
                run_with_args(args, Config::default(), &prompter)
            })
        })
        .collect();

    // Wait for all threads to complete
    let results: Vec<_> = handles.into_iter().map(|h| h.join().unwrap()).collect();

    // All should succeed (no cache corruption)
    for result in results {
        assert!(
            result.is_ok(),
            "Concurrent access should not cause failures"
        );
    }

    // Check that all outputs were created
    let output_count = fs::read_dir(&*output_dir).unwrap().count();
    assert_eq!(output_count, 3, "All concurrent runs should produce output");
}

#[test]
#[serial]
fn test_corrupted_cache_recovery() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_simple_project(&project_dir).unwrap();

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(), // Use current directory
        output: output_dir.join("context.md").to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;

    // First run to create cache
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut first_args = args.clone();

    // Apply line_numbers from config
    if !first_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        first_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !first_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        first_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&first_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            first_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            first_args.output = new_filename;
        }
    }

    run_with_args(first_args, config.clone(), &prompter).unwrap();

    // Corrupt the cache by writing invalid JSON
    let cache_dir = project_dir.join(".context-builder").join("cache");
    if cache_dir.exists() {
        let cache_files: Vec<_> = fs::read_dir(&cache_dir)
            .unwrap()
            .filter_map(|entry| entry.ok())
            .filter(|entry| {
                entry
                    .path()
                    .extension()
                    .and_then(|s| s.to_str())
                    .map(|s| s == "json")
                    .unwrap_or(false)
            })
            .collect();

        if !cache_files.is_empty() {
            // Corrupt the first cache file found
            fs::write(cache_files[0].path(), "{ invalid json }").unwrap();
        }
    }

    // Modify a file
    fs::write(
        project_dir.join("src").join("main.rs"),
        "fn main() {\n    println!(\"Recovered!\");\n}",
    )
    .unwrap();

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Second run should handle corrupted cache gracefully
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut second_args = args;

    // Apply line_numbers from config
    if !second_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        second_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !second_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        second_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&second_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            second_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            second_args.output = new_filename;
        }
    }

    let result = run_with_args(second_args, config, &prompter);
    assert!(result.is_ok(), "Should recover from corrupted cache");

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    // Should produce output despite cache corruption
    let output_count = fs::read_dir(&output_dir).unwrap().count();
    assert!(
        output_count >= 1,
        "Should produce output even with corrupted cache"
    );
}

#[test]
#[serial]
fn test_diff_only_mode_includes_added_files() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_simple_project(&project_dir).unwrap();

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    // Create config with auto_diff and diff_only enabled
    fs::write(
        project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = true
diff_only = true
"#,
    )
    .unwrap();

    let prompter = TestPrompter;

    // First run to establish baseline
    let args = Args {
        input: ".".to_string(),
        output: output_dir.join("context.md").to_string_lossy().to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false, // Will be overridden by config
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    run_with_args(args.clone(), load_config().unwrap_or_default(), &prompter).unwrap();

    // Add a new file
    fs::write(
        project_dir.join("src").join("new_module.rs"),
        "// New module added\npub fn new_function() -> String {\n    \"Hello from new module\".to_string()\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n\n    #[test]\n    fn test_new_function() {\n        assert_eq!(new_function(), \"Hello from new module\");\n    }\n}\n",
    )
    .unwrap();

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Second run with the added file
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut second_args = args;

    // Apply line_numbers from config
    if !second_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        second_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !second_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        second_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&second_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            second_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            second_args.output = new_filename;
        }
    }

    run_with_args(second_args, config, &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    // Find the latest output file
    let outputs: Vec<_> = fs::read_dir(&output_dir)
        .unwrap()
        .map(|e| e.unwrap().path())
        .collect();
    let latest_output = outputs
        .iter()
        .max_by_key(|p| fs::metadata(p).unwrap().modified().unwrap())
        .unwrap();
    let content = fs::read_to_string(latest_output).unwrap();

    // Should have change summary
    assert!(content.contains("## Change Summary"));

    // Should have added files section (not full Files section)
    assert!(content.contains("## Added Files"));
    assert!(!content.contains("## Files\n"));

    // Should include the full content of the added file (handle Windows path separators)
    assert!(content.contains("### File: `src") && content.contains("new_module.rs`"));
    assert!(content.contains("pub fn new_function() -> String"));
    assert!(content.contains("Hello from new module"));
    assert!(content.contains("_Status: Added_"));

    // Should still have the file tree and header
    assert!(content.contains("## File Tree Structure"));
    assert!(content.contains("# Directory Structure Report"));

    // Should not include full content of existing files (since they're unchanged)
    // The existing main.rs content should not be in the full Files section (handle Windows path separators)
    let main_rs_in_files = content.contains("### File: `src")
        && content.contains("main.rs`")
        && content.contains("Hello, world!");
    assert!(
        !main_rs_in_files,
        "Existing unchanged files should not have full content in diff_only mode"
    );
}
```

### File: `tests/test_binary_file_autodiff.rs`

- Size: 7957 bytes
- Modified: 2026-02-14 19:55:07 UTC

```rust
//! Integration tests for binary file handling in auto-diff mode
//!
//! This test ensures that the application doesn't crash when encountering
//! binary files during auto-diff processing.

use std::fs;
use std::path::Path;
use tempfile::tempdir;

use context_builder::config::Config;
use context_builder::{Prompter, cli::Args, run_with_args};

struct TestPrompter {
    overwrite_response: bool,
    processing_response: bool,
}

impl TestPrompter {
    fn new(overwrite_response: bool, processing_response: bool) -> Self {
        Self {
            overwrite_response,
            processing_response,
        }
    }
}

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(self.processing_response)
    }

    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(self.overwrite_response)
    }
}

fn write_file(path: &Path, contents: &str) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, contents).unwrap();
}

fn write_binary_file(path: &Path, data: &[u8]) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, data).unwrap();
}

#[test]
fn test_binary_files_dont_crash_autodiff() {
    let temp_dir = tempdir().unwrap();
    let root = temp_dir.path();

    // Create text files
    write_file(
        &root.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );
    write_file(&root.join("README.md"), "# Test Project");

    // Create binary files with various problematic byte sequences
    write_binary_file(
        &root.join("assets/image.png"),
        &[
            0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A, // PNG header
            0x00, 0x00, 0x00, 0x0D, 0x49, 0x48, 0x44, 0x52, 0xFF, 0xFE, 0xFD, 0xFC, 0x00, 0x01,
            0x02, 0x03, // Random binary data
        ],
    );

    // Create a file with null bytes
    write_binary_file(
        &root.join("data/binary.dat"),
        &[
            0x00, 0x00, 0x00, 0x00, 0xFF, 0xFF, 0xFF, 0xFF, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85,
            0x86, 0x87,
        ],
    );

    // Create a file with invalid UTF-8 sequences
    write_binary_file(
        &root.join("config/settings.bin"),
        &[
            0xC0, 0x80, // Invalid UTF-8: overlong encoding
            0xE0, 0x80, 0x80, // Invalid UTF-8: overlong encoding
            0xFF, 0xFE, 0xFF, 0xFE, // Invalid UTF-8: not valid start bytes
        ],
    );

    let output_path = root.join("output.md");

    // Configure for auto-diff mode
    let config = Config {
        auto_diff: Some(true),
        diff_context_lines: Some(3),
        ..Default::default()
    };

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: vec![], // Include all file types to catch binary files
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true, // Auto-confirm to avoid prompts
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);

    // First run - should create initial state without crashing
    let result1 = run_with_args(args.clone(), config.clone(), &prompter);
    assert!(
        result1.is_ok(),
        "First run with binary files should not crash: {:?}",
        result1
    );

    // Verify output file was created
    assert!(
        output_path.exists(),
        "Output file should be created on first run"
    );

    // Modify a text file to trigger diff on second run
    write_file(
        &root.join("src/main.rs"),
        "fn main() { println!(\"Hello, world!\"); }",
    );

    // Second run - should handle binary files in diff without crashing
    let result2 = run_with_args(args, config, &prompter);
    assert!(
        result2.is_ok(),
        "Second run with binary files should not crash during diff: {:?}",
        result2
    );

    // Read the output to verify it contains appropriate handling of binary files
    let output_content = fs::read_to_string(&output_path).unwrap();

    // Should contain the modified text file
    assert!(
        output_content.contains("Hello, world!"),
        "Output should contain modified text content"
    );

    // Binary files should be represented appropriately (not causing crashes)
    // The exact representation depends on implementation but should not crash
    assert!(
        output_content.len() > 100,
        "Output should contain substantial content indicating successful processing"
    );
}

#[test]
fn test_mixed_text_and_binary_files_autodiff() {
    let temp_dir = tempdir().unwrap();
    let root = temp_dir.path();

    // Create a mix of text and binary files
    write_file(&root.join("source.txt"), "Original text content");
    write_binary_file(&root.join("data.bin"), &[0x00, 0xFF, 0x42, 0x13, 0x37]);
    write_file(&root.join("config.json"), r#"{"version": "1.0"}"#);

    let output_path = root.join("mixed_output.md");

    let config = Config {
        auto_diff: Some(true),
        ..Default::default()
    };

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);

    // Initial run
    let result1 = run_with_args(args.clone(), config.clone(), &prompter);
    assert!(result1.is_ok(), "Initial run should succeed");

    // Modify text file and add another binary file
    write_file(&root.join("source.txt"), "Modified text content");
    write_binary_file(
        &root.join("image.jpg"),
        &[
            0xFF, 0xD8, 0xFF, 0xE0, // JPEG header
            0x00, 0x10, 0x4A, 0x46, 0x49, 0x46,
        ],
    );

    // Second run with changes
    let result2 = run_with_args(args, config, &prompter);
    assert!(
        result2.is_ok(),
        "Second run with mixed file changes should succeed"
    );

    let output_content = fs::read_to_string(&output_path).unwrap();
    assert!(
        output_content.contains("Modified text content"),
        "Should show updated text content"
    );
}

#[test]
fn test_large_binary_file_autodiff() {
    let temp_dir = tempdir().unwrap();
    let root = temp_dir.path();

    // Create a large binary file (simulating real-world scenario)
    let large_binary_data: Vec<u8> = (0..10000).map(|i| (i % 256) as u8).collect();

    write_binary_file(&root.join("large_binary.dat"), &large_binary_data);
    write_file(&root.join("small_text.txt"), "Small text file");

    let output_path = root.join("large_binary_output.md");

    let config = Config {
        auto_diff: Some(true),
        ..Default::default()
    };

    let args = Args {
        input: root.to_string_lossy().into_owned(),
        output: output_path.to_string_lossy().into_owned(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);

    // Should handle large binary files without memory issues or crashes
    let result = run_with_args(args, config, &prompter);
    assert!(
        result.is_ok(),
        "Should handle large binary files without crashing: {:?}",
        result
    );

    assert!(
        output_path.exists(),
        "Output should be created even with large binary files"
    );
}
```

### File: `tests/test_comprehensive_edge_cases.rs`

- Size: 22269 bytes
- Modified: 2026-02-14 19:56:54 UTC

```rust
//! Comprehensive edge case testing suite for context-builder v0.5.0
//!
//! This test suite covers all the critical edge cases and robustness scenarios
//! that were identified during the v0.5.0 development cycle.

use context_builder::cli::Args;
use context_builder::config::Config;
use context_builder::{Prompter, run_with_args};
use serial_test::serial;
use std::fs;
use std::path::Path;
use tempfile::tempdir;

struct TestPrompter {
    overwrite_response: bool,
    processing_response: bool,
}

impl TestPrompter {
    fn new(overwrite_response: bool, processing_response: bool) -> Self {
        Self {
            overwrite_response,
            processing_response,
        }
    }
}

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(self.processing_response)
    }

    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(self.overwrite_response)
    }
}

fn write_file(path: &Path, contents: &str) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, contents).unwrap();
}

fn write_binary_file(path: &Path, data: &[u8]) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, data).unwrap();
}

#[test]
#[serial]
fn test_comprehensive_binary_file_edge_cases() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Create various binary and problematic files
    write_file(&project_dir.join("src/normal.rs"), "fn main() {}\n");

    // Pure binary file (executable-like)
    let binary_data = vec![
        0x7f, 0x45, 0x4c, 0x46, // ELF header
        0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    ];
    write_binary_file(&project_dir.join("src/binary.rs"), &binary_data);

    // File with UTF-16 BOM
    let utf16_data = [
        0xFF, 0xFE, // UTF-16 LE BOM
        0x48, 0x00, 0x65, 0x00, 0x6C, 0x00, 0x6C, 0x00, 0x6F, 0x00, // "Hello"
        0x0A, 0x00, // newline
    ];
    write_binary_file(&project_dir.join("src/utf16.rs"), &utf16_data);

    // File with Windows-1252 encoding
    let windows1252_data = [
        0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, // "Hello "
        0x93, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x94, // "World" with smart quotes
        0x0A, // newline
    ];
    write_binary_file(&project_dir.join("src/win1252.rs"), &windows1252_data);

    // Empty file
    write_file(&project_dir.join("src/empty.rs"), "");

    // File with only null bytes
    write_binary_file(&project_dir.join("src/nulls.rs"), &[0x00; 100]);

    // Very large file (test memory efficiency)
    let large_content = "// Large file\n".repeat(10000);
    write_file(&project_dir.join("src/large.rs"), &large_content);

    // Test with different encoding strategies
    let strategies = ["detect", "strict", "skip"];

    for strategy in &strategies {
        let config = Config {
            filter: Some(vec!["rs".to_string()]),
            encoding_strategy: Some(strategy.to_string()),
            ..Default::default()
        };

        let args = Args {
            input: project_dir.to_string_lossy().to_string(),
            output: output_dir
                .join(format!("test_{}.md", strategy))
                .to_string_lossy()
                .to_string(),
            filter: vec!["rs".to_string()],
            ignore: vec![],
            preview: false,
            token_count: false,
            line_numbers: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let prompter = TestPrompter::new(true, true);
        let result = run_with_args(args, config, &prompter);

        assert!(
            result.is_ok(),
            "Should handle binary files gracefully with strategy: {}",
            strategy
        );

        // Verify output file was created
        let output_path = output_dir.join(format!("test_{}.md", strategy));
        assert!(
            output_path.exists(),
            "Output file should exist for strategy: {}",
            strategy
        );

        let content = fs::read_to_string(&output_path).unwrap();

        // Should contain normal file
        assert!(
            content.contains("fn main()"),
            "Should contain normal file content"
        );

        // Should handle binary files appropriately based on strategy
        match *strategy {
            "detect" => {
                // May contain transcoded content or binary placeholders
                assert!(
                    content.contains("Hello") || content.contains("<Binary file"),
                    "Detect strategy should transcode or show binary placeholder"
                );
            }
            "strict" | "skip" => {
                // Should show binary placeholders for non-UTF-8 files
                assert!(
                    content.contains("<Binary file") || content.contains("binary.rs"),
                    "Strict/skip strategy should show binary placeholders"
                );
            }
            _ => {}
        }

        // Should handle empty files
        assert!(content.contains("empty.rs"), "Should list empty files");

        // Should handle large files
        assert!(content.contains("large.rs"), "Should handle large files");
    }

    // No need to restore directory since we never changed it
}

#[test]
#[serial]
fn test_configuration_precedence_edge_cases() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Create test files
    write_file(&project_dir.join("test.rs"), "fn test() {}\n");
    write_file(&project_dir.join("README.md"), "# Test Project\n");

    // Test 1: Basic functionality with explicit CLI args
    let args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_dir
            .join("basic_test.md")
            .to_string_lossy()
            .to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);
    let result = run_with_args(args, Config::default(), &prompter);
    assert!(result.is_ok(), "Basic configuration test should succeed");

    let output_path = output_dir.join("basic_test.md");
    assert!(output_path.exists(), "Output should exist for basic test");

    let content = fs::read_to_string(&output_path).unwrap();
    assert!(
        content.contains("test.rs"),
        "Should include filtered .rs files"
    );
    assert!(
        !content.contains("README.md"),
        "Should exclude non-filtered files"
    );

    // Test 2: Empty filter should include all files
    let args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_dir
            .join("all_files_test.md")
            .to_string_lossy()
            .to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let result = run_with_args(args, Config::default(), &prompter);
    assert!(result.is_ok(), "All files test should succeed");

    let output_path = output_dir.join("all_files_test.md");
    let content = fs::read_to_string(&output_path).unwrap();
    assert!(
        content.contains("test.rs"),
        "Should include all files when no filter"
    );
    assert!(
        content.contains("README.md"),
        "Should include all files when no filter"
    );
}

#[test]
#[serial]
fn test_cache_consistency_edge_cases() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    write_file(&project_dir.join("test.rs"), "fn original() {}\n");

    // Change to project directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    // Create config with auto_diff enabled
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = true
"#,
    );

    let base_args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_dir
            .join("cache_test.md")
            .to_string_lossy()
            .to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = context_builder::config::load_config_from_path(&project_dir).unwrap_or_default();
    let prompter = TestPrompter::new(true, true);

    // First run - establish cache
    let result1 = run_with_args(base_args.clone(), config.clone(), &prompter);
    assert!(result1.is_ok(), "First run should succeed");

    // Verify cache was created
    let cache_dir = project_dir.join(".context-builder").join("cache");
    assert!(cache_dir.exists(), "Cache directory should be created");

    // Test cache with different path representations
    let current_dir_string = std::env::current_dir()
        .unwrap()
        .to_string_lossy()
        .to_string();
    let path_variants = [".", "./", &current_dir_string];

    for (i, path_variant) in path_variants.iter().enumerate() {
        let mut variant_args = base_args.clone();
        variant_args.input = path_variant.to_string();
        variant_args.output = output_dir
            .join(format!("variant_{}.md", i))
            .to_string_lossy()
            .to_string();

        let result = run_with_args(variant_args, config.clone(), &prompter);
        assert!(
            result.is_ok(),
            "Path variant '{}' should succeed",
            path_variant
        );

        let output_path = output_dir.join(format!("variant_{}.md", i));
        let content = fs::read_to_string(&output_path).unwrap();

        // Should show "no changes detected" because cache should be consistent
        // (or at least not crash due to path inconsistencies)
        assert!(
            content.contains("original") || content.contains("no changes"),
            "Path variant should handle cache consistently"
        );
    }

    // Test cache corruption recovery
    let cache_files: Vec<_> = fs::read_dir(&cache_dir)
        .unwrap()
        .filter_map(|entry| entry.ok())
        .filter(|entry| {
            entry
                .path()
                .extension()
                .and_then(|s| s.to_str())
                .map(|s| s == "json")
                .unwrap_or(false)
        })
        .collect();

    if !cache_files.is_empty() {
        // Corrupt the cache
        fs::write(cache_files[0].path(), "{ invalid json }").unwrap();

        // Should recover gracefully
        let result = run_with_args(base_args.clone(), config.clone(), &prompter);
        assert!(result.is_ok(), "Should recover from corrupted cache");
    }

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();
}

#[test]
#[serial]
fn test_error_conditions_and_exit_codes() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&project_dir).unwrap();
    fs::create_dir_all(&output_dir).unwrap();

    let prompter = TestPrompter::new(false, true); // Deny overwrite

    // Test 1: Non-existent input directory
    let args = Args {
        input: temp_dir
            .path()
            .join("nonexistent")
            .to_string_lossy()
            .to_string(),
        output: output_dir.join("test.md").to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let result = run_with_args(args, Config::default(), &prompter);
    assert!(
        result.is_err(),
        "Should fail with non-existent input directory"
    );

    // Test 2: Permission denied on output
    write_file(&project_dir.join("test.rs"), "fn test() {}\n");
    let output_file = output_dir.join("existing.md");
    write_file(&output_file, "existing content");

    let args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_file.to_string_lossy().to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: false, // Don't auto-confirm
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter_deny = TestPrompter::new(false, true); // Deny overwrite
    let result = run_with_args(args, Config::default(), &prompter_deny);
    assert!(result.is_err(), "Should fail when overwrite is denied");

    // Test 3: User cancellation during processing
    let args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_dir
            .join("cancelled.md")
            .to_string_lossy()
            .to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: false,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter_cancel = TestPrompter::new(true, false); // Allow overwrite, deny processing
    let result = run_with_args(args, Config::default(), &prompter_cancel);
    assert!(result.is_err(), "Should fail when processing is cancelled");
}

#[test]
#[cfg(feature = "parallel")]
fn test_memory_usage_under_parallel_processing() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    fs::create_dir_all(&project_dir).unwrap();

    // Create many files to test memory efficiency
    for i in 0..500 {
        let subdir = project_dir.join(format!("module_{}", i / 50));
        fs::create_dir_all(&subdir).unwrap();

        let content = format!(
            "// File {}\nuse std::collections::HashMap;\n\npub fn function_{}() -> i32 {{\n    {}\n}}\n",
            i, i, i
        );
        write_file(&subdir.join(format!("file_{}.rs", i)), &content);
    }

    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    let args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_dir
            .join("parallel_test.md")
            .to_string_lossy()
            .to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);
    let result = run_with_args(args, Config::default(), &prompter);

    assert!(
        result.is_ok(),
        "Parallel processing should handle many files efficiently"
    );

    let output_path = output_dir.join("parallel_test.md");
    assert!(output_path.exists(), "Output should be created");

    let content = fs::read_to_string(&output_path).unwrap();

    // Verify all files are included and properly ordered
    assert!(
        content.contains("function_0"),
        "Should contain first function"
    );
    assert!(
        content.contains("function_499"),
        "Should contain last function"
    );

    // Verify substantial content was generated
    assert!(
        content.len() > 100_000,
        "Should generate substantial output"
    );

    // Check that files appear in a reasonable order (not completely scrambled)
    let first_pos = content.find("function_0").unwrap();
    let last_pos = content.find("function_499").unwrap();
    assert!(
        first_pos < last_pos,
        "Files should maintain reasonable ordering"
    );
}

#[test]
#[serial]
fn test_cwd_independent_operation() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    let different_cwd = temp_dir.path().join("different_cwd");

    fs::create_dir_all(&project_dir).unwrap();
    fs::create_dir_all(&output_dir).unwrap();
    fs::create_dir_all(&different_cwd).unwrap();

    // Create test files
    write_file(&project_dir.join("test.rs"), "fn test() {}\n");
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
filter = ["rs"]
line_numbers = true
"#,
    );

    // Store original directory
    let original_dir = std::env::current_dir().unwrap();

    // Test from different working directories
    let test_cwds = [temp_dir.path(), &different_cwd, &original_dir];

    for (i, test_cwd) in test_cwds.iter().enumerate() {
        std::env::set_current_dir(test_cwd).unwrap();

        let args = Args {
            input: project_dir.to_string_lossy().to_string(),
            output: output_dir
                .join(format!("cwd_test_{}.md", i))
                .to_string_lossy()
                .to_string(),
            filter: vec![], // Use config defaults
            ignore: vec![],
            preview: false,
            token_count: false,
            line_numbers: false, // Use config default
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let config =
            context_builder::config::load_config_from_path(&project_dir).unwrap_or_default();
        let prompter = TestPrompter::new(true, true);

        let result = run_with_args(args, config, &prompter);
        assert!(result.is_ok(), "Should work regardless of CWD (test {})", i);

        let output_path = output_dir.join(format!("cwd_test_{}.md", i));
        assert!(
            output_path.exists(),
            "Output should exist for CWD test {}",
            i
        );

        let content = fs::read_to_string(&output_path).unwrap();

        // Should find the config file and apply its settings
        assert!(
            content.contains("test.rs"),
            "Should process rust files from config"
        );

        // All outputs should be identical regardless of CWD
        if i > 0 {
            let previous_content =
                fs::read_to_string(output_dir.join(format!("cwd_test_{}.md", i - 1))).unwrap();

            // Remove timestamps for comparison
            let normalize = |s: &str| -> String {
                s.lines()
                    .filter(|line| !line.contains("Processed at:"))
                    .collect::<Vec<_>>()
                    .join("\n")
            };

            assert_eq!(
                normalize(&content),
                normalize(&previous_content),
                "Output should be identical regardless of CWD"
            );
        }
    }

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();
}

#[test]
fn test_edge_case_filenames_and_paths() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Create files with problematic names
    let problematic_names = vec![
        "normal.rs",
        "with spaces.rs",
        "with-dashes.rs",
        "with_underscores.rs",
        "with.dots.rs",
        "uppercase.rs", // Changed from UPPERCASE.RS to avoid case issues
        "file.with.many.dots.rs",
        "123numeric.rs",
        // Note: Avoid truly problematic characters that might fail on Windows
    ];

    for name in &problematic_names {
        write_file(
            &project_dir.join("src").join(name),
            &format!("// File: {}\nfn test() {{}}\n", name),
        );
    }

    // Create nested directory structure
    write_file(
        &project_dir.join("deeply/nested/very/deep/path.rs"),
        "fn deep() {}\n",
    );

    let args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_dir
            .join("edge_case_paths.md")
            .to_string_lossy()
            .to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter::new(true, true);
    let result = run_with_args(args, Config::default(), &prompter);

    assert!(
        result.is_ok(),
        "Should handle edge case filenames without panicking"
    );

    let output_path = output_dir.join("edge_case_paths.md");
    assert!(output_path.exists(), "Output should be created");

    let content = fs::read_to_string(&output_path).unwrap();

    // Verify all problematic files are included
    for name in &problematic_names {
        assert!(
            content.contains(name),
            "Should include file with problematic name: {}",
            name
        );
    }

    // Verify deeply nested path is handled
    assert!(
        content.contains("deeply/nested") || content.contains("deeply\\nested"),
        "Should handle deeply nested paths"
    );
}
```

### File: `tests/test_config_resolution.rs`

- Size: 14174 bytes
- Modified: 2026-02-14 19:57:42 UTC

```rust
//! Integration tests for configuration resolution functionality
//!
//! These tests verify that the new config resolver properly merges CLI arguments
//! with configuration file values according to the correct precedence rules.

use serial_test::serial;
use std::fs;
use std::path::Path;
use tempfile::tempdir;

use context_builder::{Prompter, cli::Args, config_resolver::resolve_final_config, run_with_args};

struct TestPrompter {
    overwrite_response: bool,
    processing_response: bool,
}

impl TestPrompter {
    fn new(overwrite_response: bool, processing_response: bool) -> Self {
        Self {
            overwrite_response,
            processing_response,
        }
    }
}

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(self.processing_response)
    }

    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(self.overwrite_response)
    }
}

fn write_file(path: &Path, contents: &str) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, contents).unwrap();
}

/// Helper function that mimics the run() function's config resolution logic
fn run_with_resolved_config(
    args: Args,
    config: Option<context_builder::config::Config>,
    prompter: &impl Prompter,
) -> std::io::Result<()> {
    // Resolve final configuration using the new config resolver
    let resolution = resolve_final_config(args, config.clone());

    // Convert resolved config back to Args for run_with_args
    let final_args = Args {
        input: resolution.config.input,
        output: resolution.config.output,
        filter: resolution.config.filter,
        ignore: resolution.config.ignore,
        line_numbers: resolution.config.line_numbers,
        preview: resolution.config.preview,
        token_count: resolution.config.token_count,
        yes: resolution.config.yes,
        diff_only: resolution.config.diff_only,
        clear_cache: resolution.config.clear_cache,
        init: resolution.config.init,
        max_tokens: resolution.config.max_tokens,
    };

    // Create final Config with resolved values
    let final_config = context_builder::config::Config {
        auto_diff: Some(resolution.config.auto_diff),
        diff_context_lines: Some(resolution.config.diff_context_lines),
        ..config.unwrap_or_default()
    };

    run_with_args(final_args, final_config, prompter)
}

#[test]
#[serial]
fn test_cli_arguments_override_config_file() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");

    // Create a simple project
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );
    write_file(&project_dir.join("lib.py"), "def hello(): print('world')");

    // Create config file with specific settings
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
filter = ["py"]
line_numbers = true
output = "from_config.md"
"#,
    );

    fs::create_dir_all(&output_dir).unwrap();

    // CLI args that should override config
    // Change to project directory (run_with_args creates output relative to CWD)
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(), // Use current directory
        output: output_dir.join("from_cli.md").to_string_lossy().to_string(),
        filter: vec!["rs".to_string()], // Should override config's ["py"]
        ignore: vec![],
        line_numbers: true, // Can't override config boolean settings
        preview: false,
        token_count: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = context_builder::config::load_config_from_path(&project_dir).unwrap();
    let prompter = TestPrompter::new(true, true);

    let result = run_with_resolved_config(args, Some(config), &prompter);

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();
    assert!(result.is_ok(), "Should succeed with CLI override");

    // Verify output file was created with CLI name, not config name
    let output_file = output_dir.join("from_cli.md");
    assert!(output_file.exists(), "Output file should use CLI filename");

    let content = fs::read_to_string(&output_file).unwrap();

    // Should contain .rs file (CLI filter), not .py file (config filter)
    assert!(
        content.contains("main.rs"),
        "Should include .rs files from CLI filter"
    );
    assert!(
        !content.contains("lib.py"),
        "Should not include .py files despite config filter"
    );

    // Should have line numbers (config applies since we can't distinguish CLI false from default)
    assert!(
        content.contains("   1 |"),
        "Should have line numbers from config"
    );
}

#[test]
#[serial]
fn test_config_applies_when_cli_uses_defaults() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");

    // Create a simple project
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );
    write_file(&project_dir.join("lib.py"), "def hello(): print('world')");

    // Create config file
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
filter = ["py", "rs"]
line_numbers = true
ignore = ["target"]
"#,
    );

    fs::create_dir_all(&output_dir).unwrap();

    // Change to project directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    // CLI args using defaults (should be overridden by config)
    let args = Args {
        input: ".".to_string(),          // Use current directory
        output: "output.md".to_string(), // Default - should use config if available
        filter: vec![],                  // Default - should use config
        ignore: vec![],                  // Default - should use config
        line_numbers: false,             // Default - should use config
        preview: false,
        token_count: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = context_builder::config::load_config_from_path(&project_dir).unwrap();
    let prompter = TestPrompter::new(true, true);

    let result = run_with_resolved_config(args, Some(config), &prompter);

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();
    assert!(result.is_ok(), "Should succeed with config application");

    // Find the output file (should be in current working directory, which is project dir)
    let output_file = project_dir.join("output.md");
    // The tool runs with project_dir as input, so output.md should be created there
    assert!(
        output_file.exists(),
        "Output file should be created in project directory"
    );

    let content = fs::read_to_string(&output_file).unwrap();

    // Should contain both file types from config filter
    assert!(
        content.contains("main.rs"),
        "Should include .rs files from config filter"
    );
    assert!(
        content.contains("lib.py"),
        "Should include .py files from config filter"
    );

    // Should have line numbers from config
    assert!(
        content.contains("   1 |"),
        "Should have line numbers from config"
    );
}

#[test]
#[serial]
fn test_timestamped_output_and_output_folder() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let _output_dir = temp_dir.path().join("docs");

    // Create a simple project
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );

    // Create config with timestamping and output folder (relative to project)
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
output = "context.md"
output_folder = "docs"
timestamped_output = true
"#,
    );

    // Create docs directory inside project directory
    let docs_dir = project_dir.join("docs");
    fs::create_dir_all(&docs_dir).unwrap();

    // Change to project directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(),          // Use current directory
        output: "output.md".to_string(), // Should be overridden by config
        filter: vec![],
        ignore: vec![],
        line_numbers: false,
        preview: false,
        token_count: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = context_builder::config::load_config_from_path(&project_dir).unwrap();
    let prompter = TestPrompter::new(true, true);

    let result = run_with_resolved_config(args, Some(config), &prompter);

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();
    assert!(result.is_ok(), "Should succeed with timestamped output");

    // Find timestamped file in docs directory
    let docs_dir = project_dir.join("docs");
    let entries = fs::read_dir(&docs_dir).unwrap();
    let output_files: Vec<_> = entries
        .filter_map(|entry| entry.ok())
        .filter(|entry| {
            let name = entry.file_name();
            let name_str = name.to_string_lossy();
            name_str.starts_with("context_") && name_str.ends_with(".md")
        })
        .collect();

    assert!(
        !output_files.is_empty(),
        "Should have timestamped output file"
    );
    assert!(
        output_files.len() == 1,
        "Should have exactly one output file"
    );

    let output_file = &output_files[0];
    let content = fs::read_to_string(output_file.path()).unwrap();
    assert!(content.contains("main.rs"), "Should contain project files");
}

#[test]
#[serial]
fn test_mixed_explicit_and_default_values() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");

    // Create a simple project
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );
    write_file(&project_dir.join("test.py"), "print('test')");

    // Config with multiple settings
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
filter = ["py"]
line_numbers = true
yes = true
"#,
    );

    // Change to project directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(),          // Use current directory
        output: "custom.md".to_string(), // Explicit CLI value
        filter: vec![],                  // Default - should use config
        ignore: vec![],
        line_numbers: false, // Default - config will override this
        preview: false,      // Default - should use config
        token_count: false,  // Don't use token count mode so file gets created
        yes: false,          // Default - should use config
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = context_builder::config::load_config_from_path(&project_dir).unwrap();
    let prompter = TestPrompter::new(true, true);

    let result = run_with_resolved_config(args, Some(config), &prompter);

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();
    assert!(result.is_ok(), "Should succeed with mixed values");

    // Verify output file uses CLI name (created in project directory)
    let output_file = project_dir.join("custom.md");
    assert!(
        output_file.exists(),
        "Should use CLI output filename in project directory"
    );

    let content = fs::read_to_string(&output_file).unwrap();

    // Should use config filter (py files)
    assert!(
        content.contains("test.py"),
        "Should include .py files from config"
    );
    assert!(!content.contains("main.rs"), "Should not include .rs files");

    // Should use config line_numbers setting
    assert!(
        content.contains("   1 |"),
        "Should have line numbers from config"
    );
}

#[test]
#[serial]
fn test_auto_diff_configuration_warning() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");

    // Create a simple project
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );

    // Config with auto_diff but no timestamped_output (should generate warning)
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = false
"#,
    );

    // Change to project directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(), // Use current directory
        output: "output.md".to_string(),
        filter: vec![],
        ignore: vec![],
        line_numbers: false,
        preview: false,
        token_count: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = context_builder::config::load_config_from_path(&project_dir).unwrap();
    let prompter = TestPrompter::new(true, true);

    // Capture stderr to check for warnings
    let result = run_with_resolved_config(args, Some(config), &prompter);

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();
    assert!(result.is_ok(), "Should succeed despite warning");

    // Note: In a real application, we would capture stderr to verify the warning
    // For this test, we're just ensuring the config is handled without crashing
}
```

### File: `tests/test_cwd_independence.rs`

- Size: 13477 bytes
- Modified: 2026-02-14 19:55:07 UTC

```rust
//! Integration tests for CWD independence
//!
//! This test verifies that the application loads config and creates cache
//! relative to the project root, not the current working directory.

use serial_test::serial;
use std::fs;
use std::path::Path;
use tempfile::tempdir;

use context_builder::{Prompter, cli::Args, run_with_args};

struct TestPrompter {
    overwrite_response: bool,
    processing_response: bool,
}

impl TestPrompter {
    fn new(overwrite_response: bool, processing_response: bool) -> Self {
        Self {
            overwrite_response,
            processing_response,
        }
    }
}

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(self.processing_response)
    }

    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(self.overwrite_response)
    }
}

fn write_file(path: &Path, contents: &str) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, contents).unwrap();
}

#[test]
#[serial]
fn test_config_loaded_from_project_root_not_cwd() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    let working_dir = temp_dir.path().join("working");

    // Create project with config file
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
line_numbers = true
filter = ["rs"]
"#,
    );

    // Create different config in working directory (should be ignored)
    write_file(
        &working_dir.join("context-builder.toml"),
        r#"
auto_diff = false
line_numbers = false
filter = ["txt"]
"#,
    );

    fs::create_dir_all(&output_dir).unwrap();
    fs::create_dir_all(&working_dir).unwrap();

    // Change to working directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&working_dir).unwrap();

    // Load config from project directory (not CWD)
    let config = context_builder::config::load_config_from_path(&project_dir).unwrap_or_default();

    let mut args = Args {
        input: project_dir.to_string_lossy().to_string(), // Absolute path to project
        output: output_dir.join("output.md").to_string_lossy().to_string(),
        filter: vec![], // Should be overridden by project config
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false, // Should be overridden by project config
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    // Apply config settings to args (mimicking the run() function logic)
    if args.filter.is_empty()
        && let Some(filter) = config.filter.clone()
    {
        args.filter = filter;
    }
    if !args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        args.line_numbers = line_numbers;
    }

    let prompter = TestPrompter::new(true, true);
    let result = run_with_args(args, config, &prompter);

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    assert!(result.is_ok(), "Should succeed with CWD independence");

    let output_content = fs::read_to_string(output_dir.join("output.md")).unwrap();

    // Verify that project config was used, not working directory config
    assert!(
        output_content.contains("   1 |"),
        "Should have line numbers from project config"
    );
    assert!(
        output_content.contains("main.rs"),
        "Should include .rs files from project config filter"
    );
}

#[test]
#[serial]
fn test_cache_created_in_project_root_not_cwd() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    let working_dir = temp_dir.path().join("working");

    // Create project with auto-diff enabled
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() { println!(\"Hello\"); }",
    );
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = true
"#,
    );

    fs::create_dir_all(&output_dir).unwrap();
    fs::create_dir_all(&working_dir).unwrap();

    // Get absolute paths before changing directory
    let project_dir_abs = project_dir.canonicalize().unwrap();
    let output_dir_abs = output_dir.canonicalize().unwrap();
    let working_dir_abs = working_dir.canonicalize().unwrap();

    // Change to working directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&working_dir_abs).unwrap();

    // Load config from project directory
    let config =
        context_builder::config::load_config_from_path(&project_dir_abs).unwrap_or_default();

    let mut args = Args {
        input: project_dir_abs.to_string_lossy().to_string(), // Absolute path to project
        output: output_dir_abs
            .join("context.md")
            .to_string_lossy()
            .to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        use chrono::Utc;
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            args.output = output_dir_abs
                .join(new_filename)
                .to_string_lossy()
                .to_string();
        }
    }

    let prompter = TestPrompter::new(true, true);

    // First run to create cache
    let result1 = run_with_args(args.clone(), config.clone(), &prompter);
    assert!(result1.is_ok(), "First run should succeed");

    // Verify cache was created in project directory, not working directory
    let project_cache = project_dir_abs.join(".context-builder").join("cache");
    let working_cache = working_dir_abs.join(".context-builder").join("cache");

    assert!(
        project_cache.exists(),
        "Cache should be created in project directory"
    );
    assert!(
        !working_cache.exists(),
        "Cache should NOT be created in working directory"
    );

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Modify project file
    // Modify a file to trigger diff
    write_file(
        &project_dir_abs.join("src/main.rs"),
        "fn main() { println!(\"Hello, modified!\"); }",
    );

    // Create second args with new timestamp
    let mut args2 = args.clone();
    if config.timestamped_output.unwrap_or(false) {
        use chrono::Utc;
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&args2.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            args2.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            args2.output = output_dir_abs
                .join(new_filename)
                .to_string_lossy()
                .to_string();
        }
    }

    // Second run should detect changes using cache from project directory
    let result2 = run_with_args(args2, config, &prompter);
    assert!(result2.is_ok(), "Second run should succeed");

    // Find output files (should have timestamps) - use absolute path
    // Add retry logic to handle potential race conditions
    let output_files = (0..5)
        .find_map(|_| {
            std::thread::sleep(std::time::Duration::from_millis(50));
            if let Ok(entries) = fs::read_dir(&output_dir_abs) {
                let files: Vec<_> = entries
                    .filter_map(|entry| entry.ok())
                    .filter(|entry| {
                        let name = entry.file_name();
                        let name_str = name.to_string_lossy();
                        name_str.starts_with("context") && name_str.ends_with(".md")
                    })
                    .collect();
                if files.len() >= 2 { Some(files) } else { None }
            } else {
                None
            }
        })
        .expect("Failed to find output files after retries");

    // Restore original directory after file operations
    std::env::set_current_dir(original_dir).unwrap();

    assert!(
        output_files.len() >= 2,
        "Should have multiple timestamped outputs, found: {}",
        output_files.len()
    );

    // Check that second output contains diff information
    let latest_output = output_files
        .iter()
        .max_by_key(|entry| {
            // All paths are already absolute since we used output_dir_abs
            fs::metadata(entry.path()).unwrap().modified().unwrap()
        })
        .unwrap();

    // Read the latest file content
    let latest_content = fs::read_to_string(latest_output.path()).unwrap();
    assert!(
        latest_content.contains("## Change Summary") || latest_content.contains("Modified"),
        "Should contain change information from auto-diff"
    );
}

#[test]
#[serial]
fn test_clear_cache_uses_project_root() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let working_dir = temp_dir.path().join("working");

    // Create project and working directories
    write_file(&project_dir.join("src/main.rs"), "fn main() {}");
    fs::create_dir_all(&working_dir).unwrap();

    // Create cache in project directory
    let project_cache_dir = project_dir.join(".context-builder").join("cache");
    fs::create_dir_all(&project_cache_dir).unwrap();
    fs::write(project_cache_dir.join("test_cache.json"), "{}").unwrap();

    // Create cache in working directory (should not be affected)
    let working_cache_dir = working_dir.join(".context-builder").join("cache");
    fs::create_dir_all(&working_cache_dir).unwrap();
    fs::write(working_cache_dir.join("test_cache.json"), "{}").unwrap();

    // Change to working directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&working_dir).unwrap();

    // Simulate the cache clearing logic from run() function
    // This tests that cache clearing uses project root, not CWD
    let cache_path = project_dir.join(".context-builder").join("cache");
    assert!(
        cache_path.exists(),
        "Project cache should exist before clearing"
    );

    if cache_path.exists() {
        fs::remove_dir_all(&cache_path).unwrap();
    }

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    // Project cache should be cleared
    assert!(
        !project_cache_dir.exists(),
        "Project cache should be cleared"
    );

    // Working directory cache should be untouched
    assert!(
        working_cache_dir.exists() && fs::read_dir(&working_cache_dir).unwrap().count() > 0,
        "Working directory cache should remain untouched"
    );
}

#[test]
#[serial]
fn test_load_config_from_path_function() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let working_dir = temp_dir.path().join("working");

    // Create project with config file
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
line_numbers = true
filter = ["rs"]
"#,
    );

    // Create different config in working directory
    write_file(
        &working_dir.join("context-builder.toml"),
        r#"
auto_diff = false
line_numbers = false
filter = ["txt"]
"#,
    );

    // Change to working directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&working_dir).unwrap();

    // Load config from project directory (not CWD)
    let config = context_builder::config::load_config_from_path(&project_dir);

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    assert!(
        config.is_some(),
        "Should load config from project directory"
    );
    let config = config.unwrap();

    assert_eq!(
        config.auto_diff,
        Some(true),
        "Should use project config auto_diff"
    );
    assert_eq!(
        config.line_numbers,
        Some(true),
        "Should use project config line_numbers"
    );
    assert_eq!(
        config.filter,
        Some(vec!["rs".to_string()]),
        "Should use project config filter"
    );
}
```

### File: `tests/test_determinism.rs`

- Size: 20050 bytes
- Modified: 2026-02-14 22:43:45 UTC

```rust
//! Integration tests for determinism and robustness of context-builder
//!
//! These tests verify that the critical bug fixes are working correctly:
//! - Deterministic output order
//! - Robust caching
//! - Thread safety

use pretty_assertions::assert_eq;
use serial_test::serial;
use std::fs;
use std::path::Path;
use tempfile::tempdir;

use chrono::Utc;
use context_builder::cli::Args;
use context_builder::config::{Config, load_config};
use context_builder::{Prompter, run_with_args};

/// Test prompter that always confirms
struct TestPrompter;

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(true)
    }
    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(true)
    }
}

/// Create a test project with multiple files in different directories
fn create_test_project(base_dir: &Path) -> std::io::Result<()> {
    let src_dir = base_dir.join("src");
    let tests_dir = base_dir.join("tests");
    let docs_dir = base_dir.join("docs");

    fs::create_dir_all(&src_dir)?;
    fs::create_dir_all(&tests_dir)?;
    fs::create_dir_all(&docs_dir)?;

    // Create files in different orders to test sorting
    fs::write(
        src_dir.join("main.rs"),
        "fn main() {\n    println!(\"Hello\");\n}",
    )?;
    fs::write(src_dir.join("lib.rs"), "pub mod utils;\npub mod config;")?;
    fs::write(src_dir.join("utils.rs"), "pub fn helper() {}")?;
    fs::write(
        tests_dir.join("integration.rs"),
        "#[test]\nfn test_something() {}",
    )?;
    fs::write(tests_dir.join("unit.rs"), "#[test]\nfn test_unit() {}")?;
    fs::write(
        docs_dir.join("README.md"),
        "# Project\n\nThis is a test project.",
    )?;
    fs::write(
        base_dir.join("Cargo.toml"),
        "[package]\nname = \"test\"\nversion = \"0.1.0\"",
    )?;

    Ok(())
}

#[test]
#[serial] // Ensure tests don't interfere with each other
fn test_deterministic_output_multiple_runs() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_test_project(&project_dir).unwrap();

    // Note: The actual output files may have timestamps appended due to auto-diff mode
    // We'll need to find the actual files created
    let prompter = TestPrompter;

    // Run twice with identical arguments
    let result1 = run_with_args(
        Args {
            input: project_dir.to_string_lossy().to_string(),
            output: temp_dir
                .path()
                .join("output1.md")
                .to_string_lossy()
                .to_string(),
            filter: vec!["rs".to_string(), "md".to_string(), "toml".to_string()],
            ignore: vec![],
            preview: false,
            token_count: false,
            line_numbers: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        },
        Config::default(),
        &prompter,
    );

    let result2 = run_with_args(
        Args {
            input: project_dir.to_string_lossy().to_string(),
            output: temp_dir
                .path()
                .join("output2.md")
                .to_string_lossy()
                .to_string(),
            filter: vec!["rs".to_string(), "md".to_string(), "toml".to_string()],
            ignore: vec![],
            preview: false,
            token_count: false,
            line_numbers: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        },
        Config::default(),
        &prompter,
    );

    if let Err(e) = result1 {
        panic!("First run failed: {}", e);
    }
    if let Err(e) = result2 {
        panic!("Second run failed: {}", e);
    }

    // Find the actual output files (they may have timestamps appended)
    let temp_entries: Vec<_> = fs::read_dir(temp_dir.path())
        .unwrap()
        .filter_map(|entry| entry.ok())
        .filter(|entry| {
            let file_name = entry.file_name();
            let name = file_name.to_string_lossy();
            name.starts_with("output") && name.ends_with(".md")
        })
        .collect();

    if temp_entries.len() < 2 {
        eprintln!("Expected 2 output files, found {}", temp_entries.len());
        eprintln!("Temp directory contents:");
        for entry in fs::read_dir(temp_dir.path()).unwrap() {
            eprintln!("  {:?}", entry.unwrap().file_name());
        }
        panic!("Not enough output files found");
    }

    // Sort to ensure consistent ordering
    let mut output_files: Vec<_> = temp_entries.iter().map(|entry| entry.path()).collect();
    output_files.sort();

    // Read both outputs
    let content1 = fs::read_to_string(&output_files[0]).unwrap();
    let content2 = fs::read_to_string(&output_files[1]).unwrap();

    // Debug: Write contents to temp files for inspection
    fs::write(temp_dir.path().join("debug_content1.md"), &content1).unwrap();
    fs::write(temp_dir.path().join("debug_content2.md"), &content2).unwrap();

    // Normalize timestamps for comparison since they will be different
    let normalize = |content: &str| -> String {
        content
            .lines()
            .map(|line| {
                if line.starts_with("Processed at: ") {
                    "Processed at: <timestamp>"
                } else {
                    line
                }
            })
            .collect::<Vec<_>>()
            .join("\n")
    };

    let normalized1 = normalize(&content1);
    let normalized2 = normalize(&content2);

    // Debug: Write normalized contents for comparison
    fs::write(temp_dir.path().join("debug_normalized1.md"), &normalized1).unwrap();
    fs::write(temp_dir.path().join("debug_normalized2.md"), &normalized2).unwrap();

    // They should be identical (deterministic) after normalizing timestamps
    if normalized1 != normalized2 {
        eprintln!(
            "Content1 length: {}, Content2 length: {}",
            normalized1.len(),
            normalized2.len()
        );
        eprintln!(
            "First difference at position: {:?}",
            normalized1
                .chars()
                .zip(normalized2.chars())
                .position(|(a, b)| a != b)
        );
        eprintln!("Debug files written to: {}", temp_dir.path().display());
        panic!("Output should be deterministic across multiple runs (ignoring timestamps)");
    }

    // Verify that files are listed in a consistent order
    let lines: Vec<&str> = content1.lines().collect();
    let file_lines: Vec<&str> = lines
        .iter()
        .filter(|line| line.starts_with("### File: `"))
        .copied()
        .collect();

    // Should have found some files
    assert!(
        !file_lines.is_empty(),
        "Should have found some file entries"
    );

    // Check that files are sorted by relevance category:
    // Category 0: Cargo.toml (config), README.md (key project doc)
    // Category 1: src/* (source code) — entry points first (lib.rs, main.rs before utils.rs)
    // Category 2: tests/* (tests)
    let expected_order = vec![
        "### File: `Cargo.toml`",
        "### File: `docs/README.md`",
        "### File: `src/lib.rs`",
        "### File: `src/main.rs`",
        "### File: `src/utils.rs`",
        "### File: `tests/integration.rs`",
        "### File: `tests/unit.rs`",
    ];
    assert_eq!(
        file_lines, expected_order,
        "Files should be listed in relevance order (config+docs → source (entry points first) → tests)"
    );
}
#[test]
#[serial] // Ensure tests don't interfere with each other
fn test_deterministic_file_tree_order() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_test_project(&project_dir).unwrap();

    let output_path = temp_dir.path().join("output.md");

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(),
        output: output_path.to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;
    run_with_args(args, Config::default(), &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    let content = fs::read_to_string(&output_path).unwrap();

    // Find the file tree section
    let tree_start = content
        .find("## File Tree Structure")
        .expect("Should have file tree section");
    let files_start = content.find("### File: `").unwrap_or(content.len());
    let tree_section = &content[tree_start..files_start];

    // Check that directories and files appear in alphabetical order in the tree
    // This is a basic check - a more sophisticated test would parse the tree structure
    assert!(tree_section.contains("Cargo.toml"));
    // Check for directory entries - they may appear as just the name or with trailing content
    assert!(tree_section.contains("docs") || tree_section.contains("docs/"));
    assert!(tree_section.contains("src") || tree_section.contains("src/"));
    assert!(tree_section.contains("tests") || tree_section.contains("tests/"));
}

#[test]
#[serial] // Ensure cache tests don't interfere with each other
fn test_cache_collision_prevention() {
    let temp_dir1 = tempdir().unwrap();
    let temp_dir2 = tempdir().unwrap();

    let project1 = temp_dir1.path().join("project");
    let project2 = temp_dir2.path().join("project");

    create_test_project(&project1).unwrap();
    create_test_project(&project2).unwrap();

    // Add different content to make projects distinct
    fs::write(project1.join("unique1.txt"), "This is project 1").unwrap();
    fs::write(project2.join("unique2.txt"), "This is project 2").unwrap();

    let output1 = temp_dir1.path().join("output.md");
    let output2 = temp_dir2.path().join("output.md");

    let prompter = TestPrompter;

    // Change to project1 directory and run
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project1).unwrap();

    let args1 = Args {
        input: ".".to_string(),
        output: output1.to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    run_with_args(args1, Config::default(), &prompter).unwrap();

    // Change to project2 directory and run
    std::env::set_current_dir(&project2).unwrap();

    let args2 = Args {
        input: ".".to_string(),
        output: output2.to_string_lossy().to_string(),
        filter: vec!["txt".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,

        yes: true,

        diff_only: false,

        clear_cache: false,

        init: false,
        max_tokens: None,
    };

    run_with_args(args2, Config::default(), &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    let content1 = fs::read_to_string(&output1).unwrap();
    let content2 = fs::read_to_string(&output2).unwrap();

    // Outputs should be different due to different projects and configs
    assert_ne!(
        content1, content2,
        "Different projects should produce different outputs"
    );

    // Each should contain their unique content
    assert!(content1.contains("unique1.txt"));
    assert!(content2.contains("unique2.txt"));
}

#[test]
#[serial] // Ensure tests don't interfere with each other
fn test_custom_ignores_performance() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");

    // Create a project with ignored directories
    create_test_project(&project_dir).unwrap();

    let target_dir = project_dir.join("target");
    let node_modules_dir = project_dir.join("node_modules");

    fs::create_dir_all(&target_dir).unwrap();
    fs::create_dir_all(&node_modules_dir).unwrap();

    // Create many files in ignored directories
    for i in 0..10 {
        fs::write(target_dir.join(format!("file{}.txt", i)), "ignored content").unwrap();
        fs::write(
            node_modules_dir.join(format!("module{}.js", i)),
            "ignored js",
        )
        .unwrap();
    }

    let output_path = temp_dir.path().join("output.md");

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args = Args {
        input: ".".to_string(),
        output: output_path.to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec!["target".to_string(), "node_modules".to_string()],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;
    let start = std::time::Instant::now();

    run_with_args(args, Config::default(), &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    let duration = start.elapsed();

    let content = fs::read_to_string(&output_path).unwrap();

    // Verify ignored files are not included
    assert!(!content.contains("target/file"));
    assert!(!content.contains("node_modules/module"));

    // Performance should be reasonable (this is a basic check)
    assert!(
        duration.as_secs() < 5,
        "Should complete within reasonable time even with ignored directories"
    );
}

#[test]
#[serial] // Ensure cache tests don't interfere with each other
fn test_configuration_affects_cache_key() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    create_test_project(&project_dir).unwrap();

    // Test that different configurations create different cache behaviors
    // This is verified indirectly by ensuring different configs produce appropriate outputs

    let output1_path = temp_dir.path().join("output1.md");
    let output2_path = temp_dir.path().join("output2.md");

    // Change to project directory so config loading works
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    let args1 = Args {
        input: ".".to_string(),
        output: output1_path.to_string_lossy().to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let args2 = Args {
        input: ".".to_string(),
        output: output2_path.to_string_lossy().to_string(),
        filter: vec!["md".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;

    run_with_args(args1, Config::default(), &prompter).unwrap();
    run_with_args(args2, Config::default(), &prompter).unwrap();

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    let content1 = fs::read_to_string(&output1_path).unwrap();
    let content2 = fs::read_to_string(&output2_path).unwrap();

    // Different filters should produce different outputs
    assert_ne!(content1, content2);

    // Verify filter effects
    assert!(content1.contains(".rs"));
    assert!(content2.contains("README.md"));
    // Note: Due to file tree section, both outputs may contain references to all files
    // but the actual file content sections should be filtered
}

#[test]
#[serial] // Ensure tests don't interfere with each other
fn test_edge_case_filenames_no_panic() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    fs::create_dir_all(&project_dir).unwrap();

    // Create files with edge case names that could cause panics
    fs::write(project_dir.join(".bashrc"), "# bash config").unwrap(); // no extension
    fs::write(project_dir.join("Dockerfile"), "FROM alpine").unwrap(); // no extension
    fs::write(project_dir.join(".gitignore"), "target/").unwrap(); // starts with dot, no extension

    // Change to project directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    // Create a config file that enables timestamped output
    fs::write(
        project_dir.join("context-builder.toml"),
        r#"
timestamped_output = true
auto_diff = true
"#,
    )
    .unwrap();

    // Test with output filename that has no extension (extreme edge case)
    let output_path = temp_dir.path().join("no_extension_output");

    let args = Args {
        input: ".".to_string(),
        output: output_path.to_string_lossy().to_string(),
        filter: vec![],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let prompter = TestPrompter;

    // This should not panic even with edge case filenames
    let config = load_config().unwrap_or_default();

    // Apply config merging manually since we're bypassing run()
    let mut final_args = args;

    // Apply line_numbers from config
    if !final_args.line_numbers
        && let Some(line_numbers) = config.line_numbers
    {
        final_args.line_numbers = line_numbers;
    }

    // Apply diff_only from config
    if !final_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        final_args.diff_only = diff_only;
    }

    // Apply timestamping manually since we're bypassing run()
    if config.timestamped_output.unwrap_or(false) {
        let timestamp = Utc::now().format("%Y%m%d%H%M%S").to_string();
        let path = std::path::Path::new(&final_args.output);
        let stem = path
            .file_stem()
            .and_then(|s| s.to_str())
            .unwrap_or("output");
        let extension = path.extension().and_then(|s| s.to_str()).unwrap_or("md");
        let new_filename = format!("{}_{}.{}", stem, timestamp, extension);
        if let Some(parent) = path.parent() {
            final_args.output = parent.join(new_filename).to_string_lossy().to_string();
        } else {
            final_args.output = new_filename;
        }
    }

    let result = run_with_args(final_args, config, &prompter);
    std::env::set_current_dir(original_dir).unwrap();

    // Should succeed without panicking
    assert!(
        result.is_ok(),
        "Should handle edge case filenames without panicking"
    );

    // Verify a timestamped file was created
    let temp_entries: Vec<_> = fs::read_dir(temp_dir.path())
        .unwrap()
        .filter_map(|entry| entry.ok())
        .filter(|entry| {
            let name = entry.file_name();
            let name_str = name.to_string_lossy();
            let year = Utc::now().format("%Y").to_string();
            name_str.starts_with("no_extension_output_") && name_str.contains(&year)
        })
        .collect();

    assert!(
        !temp_entries.is_empty(),
        "Should create timestamped output file even with edge case input filename"
    );
}
```

### File: `tests/test_parallel_memory.rs`

- Size: 8743 bytes
- Modified: 2026-02-14 19:55:07 UTC

```rust
//! Integration test for streaming parallel processing with memory efficiency

use context_builder::cli::Args;
use context_builder::config::Config;
use context_builder::{Prompter, run_with_args};
use std::fs;

use tempfile::tempdir;

struct TestPrompter {
    overwrite_response: bool,
    processing_response: bool,
}

impl TestPrompter {
    fn new(overwrite_response: bool, processing_response: bool) -> Self {
        Self {
            overwrite_response,
            processing_response,
        }
    }
}

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(self.processing_response)
    }

    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(self.overwrite_response)
    }
}

#[cfg(feature = "parallel")]
#[test]
fn test_streaming_parallel_processing() {
    let dir = tempdir().unwrap();
    let base_path = dir.path();

    // Create a test project with multiple files
    for i in 0..100 {
        let subdir = base_path.join(format!("module_{}", i / 10));
        fs::create_dir_all(&subdir).unwrap();

        let file_path = subdir.join(format!("file_{}.rs", i));
        let content = format!(
            "// File {}\nuse std::collections::HashMap;\n\npub fn function_{}() -> HashMap<String, i32> {{\n    let mut map = HashMap::new();\n    map.insert(\"key_{}\".to_string(), {});\n    map\n}}\n",
            i, i, i, i
        );
        fs::write(&file_path, content).unwrap();
    }

    let output_path = base_path.join("output.md");

    // Create CLI args for processing
    let args = Args {
        input: base_path.to_string_lossy().to_string(),
        output: output_path.to_string_lossy().to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = Config::default();
    let prompter = TestPrompter::new(true, true);

    // Process files using the proper flow through lib.rs
    let result = run_with_args(args, config, &prompter);

    assert!(result.is_ok(), "Parallel streaming should succeed");

    // Verify the output file was created and contains expected content
    assert!(output_path.exists(), "Output file should be created");

    let output_content = fs::read_to_string(&output_path).unwrap();

    // If it doesn't have individual file sections, this is expected behavior for auto-diff mode
    // when there's no previous state. Let's check for basic structure instead.
    assert!(
        output_content.contains("# Directory Structure Report"),
        "Output should contain header"
    );
    assert!(
        output_content.contains("## File Tree Structure"),
        "Output should contain file tree"
    );

    // Check if we have individual file content (non-auto-diff mode) or just structure (auto-diff mode)
    if output_content.contains("## Files") {
        // Full content mode - verify all files are included in correct order
        for i in 0..100 {
            let expected_file_header = format!("### File: `module_{}/file_{}.rs`", i / 10, i);
            assert!(
                output_content.contains(&expected_file_header),
                "Output should contain file header for file {}",
                i
            );

            let expected_function = format!("pub fn function_{}()", i);
            assert!(
                output_content.contains(&expected_function),
                "Output should contain function for file {}",
                i
            );
        }

        // Verify file ordering is maintained (first file should appear before last file)
        let first_file_pos = output_content
            .find("### File: `module_0/file_0.rs`")
            .expect("First file should be in output");
        let last_file_pos = output_content
            .find("### File: `module_9/file_99.rs`")
            .expect("Last file should be in output");

        assert!(
            first_file_pos < last_file_pos,
            "Files should maintain their original order"
        );
    } else {
        // Auto-diff mode or similar - just verify structure is correct
        // At minimum, verify we have reasonable file tree structure
        assert!(
            output_content.contains("module_0"),
            "Should contain module_0"
        );
        assert!(
            output_content.contains("module_9"),
            "Should contain module_9"
        );
        assert!(
            output_content.contains("file_0.rs"),
            "Should contain file_0.rs"
        );
        assert!(
            output_content.contains("file_99.rs"),
            "Should contain file_99.rs"
        );
    }
}

#[cfg(feature = "parallel")]
#[test]
fn test_parallel_error_handling() {
    let dir = tempdir().unwrap();
    let base_path = dir.path();

    // Create some regular files and one that will cause issues
    fs::write(base_path.join("good1.rs"), "fn good1() {}").unwrap();
    fs::write(base_path.join("good2.rs"), "fn good2() {}").unwrap();

    // Create a binary file that should be handled gracefully
    // Use more null bytes to ensure it's detected as binary
    let binary_data = vec![
        0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A, // PNG header
        0x00, 0x00, 0x00, 0x0D, 0x49, 0x48, 0x44, 0x52, // PNG chunk
        0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, // More binary data
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // Null bytes
    ];
    fs::write(base_path.join("binary.rs"), binary_data).unwrap();

    let output_path = base_path.join("output.md");

    let args = Args {
        input: base_path.to_string_lossy().to_string(),
        output: output_path.to_string_lossy().to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = Config::default();
    let prompter = TestPrompter::new(true, true);

    // Should succeed even with binary files
    let result = run_with_args(args, config, &prompter);

    assert!(result.is_ok(), "Should handle binary files gracefully");

    let output_content = fs::read_to_string(&output_path).unwrap();

    // Verify good files are processed
    assert!(output_content.contains("fn good1()"));
    assert!(output_content.contains("fn good2()"));

    // Verify binary file is handled with placeholder
    assert!(output_content.contains("### File: `binary.rs`"));
    assert!(output_content.contains("<Binary file or unsupported encoding:"));
}

#[cfg(feature = "parallel")]
#[test]
fn test_memory_efficiency_with_large_files() {
    let dir = tempdir().unwrap();
    let base_path = dir.path();

    // Create files with substantial content to test memory usage
    for i in 0..20 {
        let file_path = base_path.join(format!("large_file_{}.rs", i));
        let mut content = format!("// Large file {}\n", i);

        // Add substantial content (about 10KB per file)
        for j in 0..200 {
            content.push_str(&format!(
                "pub fn function_{}_{}() -> String {{\n    format!(\"Function {} in file {}\")\n}}\n\n",
                i, j, j, i
            ));
        }

        fs::write(&file_path, content).unwrap();
    }

    let output_path = base_path.join("output.md");

    let args = Args {
        input: base_path.to_string_lossy().to_string(),
        output: output_path.to_string_lossy().to_string(),
        filter: vec!["rs".to_string()],
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false,
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    let config = Config::default();
    let prompter = TestPrompter::new(true, true);

    // This should complete without excessive memory usage
    let result = run_with_args(args, config, &prompter);

    assert!(result.is_ok(), "Should handle large files efficiently");

    let output_content = fs::read_to_string(&output_path).unwrap();

    // Verify all large files are included
    for i in 0..20 {
        assert!(
            output_content.contains(&format!("### File: `large_file_{}.rs`", i)),
            "Should contain large file {}",
            i
        );
    }

    // Verify substantial content is present
    assert!(
        output_content.len() > 100_000,
        "Output should be substantial"
    );
}
```

### File: `tests/test_phase4_integration.rs`

- Size: 11080 bytes
- Modified: 2026-02-14 19:57:40 UTC

```rust
//! Integration test for all Phase 4 features working together
//!
//! This test validates that the enhanced binary file handling, improved diff_only mode,
//! and comprehensive edge case handling all work correctly in combination.

use context_builder::cli::Args;
use context_builder::config::Config;
use context_builder::{Prompter, run_with_args};
use std::fs;
use std::path::Path;
use tempfile::tempdir;

struct TestPrompter {
    overwrite_response: bool,
    processing_response: bool,
}

impl TestPrompter {
    fn new(overwrite_response: bool, processing_response: bool) -> Self {
        Self {
            overwrite_response,
            processing_response,
        }
    }
}

impl Prompter for TestPrompter {
    fn confirm_processing(&self, _file_count: usize) -> std::io::Result<bool> {
        Ok(self.processing_response)
    }

    fn confirm_overwrite(&self, _file_path: &str) -> std::io::Result<bool> {
        Ok(self.overwrite_response)
    }
}

fn write_file(path: &Path, contents: &str) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, contents).unwrap();
}

fn write_binary_file(path: &Path, data: &[u8]) {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent).unwrap();
    }
    fs::write(path, data).unwrap();
}

#[test]
fn test_phase4_features_integration() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Create config with enhanced features enabled
    write_file(
        &project_dir.join("context-builder.toml"),
        r#"
auto_diff = true
timestamped_output = true
diff_only = true
encoding_strategy = "detect"
filter = ["rs", "txt"]
"#,
    );

    // Change to project directory
    let original_dir = std::env::current_dir().unwrap();
    std::env::set_current_dir(&project_dir).unwrap();

    // Create initial files with various encoding scenarios
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() {\n    println!(\"Hello, world!\");\n}\n",
    );

    // UTF-8 file
    write_file(
        &project_dir.join("src/utils.rs"),
        "// UTF-8 file\npub fn helper() -> String {\n    \"Hello from helper\".to_string()\n}\n",
    );

    // Windows-1252 encoded file
    let windows1252_data = [
        0x2F, 0x2F, 0x20, // "// "
        0x57, 0x69, 0x6E, 0x64, 0x6F, 0x77, 0x73, 0x2D, 0x31, 0x32, 0x35, 0x32,
        0x20, // "Windows-1252 "
        0x93, 0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x94, // "Hello" with smart quotes
        0x0A, // newline
        0x70, 0x75, 0x62, 0x20, 0x66, 0x6E, 0x20, 0x74, 0x65, 0x73, 0x74, 0x28, 0x29, 0x20, 0x7B,
        0x7D, 0x0A, // "pub fn test() {}"
    ];
    write_binary_file(&project_dir.join("src/encoded.rs"), &windows1252_data);

    // Binary file that should be skipped - use executable-like binary data
    let binary_data = vec![
        0x7f, 0x45, 0x4c, 0x46, // ELF header
        0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00,
        0x3e, // More ELF data
        0xff, 0xfe, 0xfd, 0xfc, 0xfb, 0xfa, 0xf9, 0xf8, // High bytes
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // Null bytes
    ];
    write_binary_file(&project_dir.join("data.txt"), &binary_data);

    let prompter = TestPrompter::new(true, true);
    let config = context_builder::config::load_config_from_path(&project_dir).unwrap_or_default();

    // First run - establish baseline
    let args = Args {
        input: project_dir.to_string_lossy().to_string(),
        output: output_dir.join("baseline.md").to_string_lossy().to_string(),
        filter: vec![], // Use config filter
        ignore: vec![],
        preview: false,
        token_count: false,
        line_numbers: false,
        yes: true,
        diff_only: false, // Will be overridden by config
        clear_cache: false,
        init: false,
        max_tokens: None,
    };

    // Apply config manually (simulating what happens in the real application)
    let mut resolved_args = args.clone();
    if resolved_args.filter.is_empty()
        && let Some(ref config_filter) = config.filter
    {
        resolved_args.filter = config_filter.clone();
    }
    if !resolved_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        resolved_args.diff_only = diff_only;
    }

    let result1 = run_with_args(resolved_args, config.clone(), &prompter);
    assert!(result1.is_ok(), "First run should succeed");

    // Add a new file to test improved diff_only mode
    write_file(
        &project_dir.join("src/new_feature.rs"),
        "// New feature added\npub fn new_feature() -> String {\n    \"Brand new functionality\".to_string()\n}\n\n#[cfg(test)]\nmod tests {\n    use super::*;\n\n    #[test]\n    fn test_new_feature() {\n        assert_eq!(new_feature(), \"Brand new functionality\");\n    }\n}\n",
    );

    // Modify existing file
    write_file(
        &project_dir.join("src/main.rs"),
        "fn main() {\n    println!(\"Hello, enhanced world!\");\n}\n",
    );

    // Small delay to ensure different timestamps
    std::thread::sleep(std::time::Duration::from_millis(1100));

    // Second run with changes
    let mut second_args = args;
    second_args.input = project_dir.to_string_lossy().to_string();
    second_args.output = output_dir.join("enhanced.md").to_string_lossy().to_string();

    // Apply config manually
    if second_args.filter.is_empty()
        && let Some(ref config_filter) = config.filter
    {
        second_args.filter = config_filter.clone();
    }
    if !second_args.diff_only
        && let Some(diff_only) = config.diff_only
    {
        second_args.diff_only = diff_only;
    }

    let result2 = run_with_args(second_args, config, &prompter);
    assert!(result2.is_ok(), "Second run should succeed");

    // Restore original directory
    std::env::set_current_dir(original_dir).unwrap();

    // Verify the enhanced features work correctly
    let outputs: Vec<_> = fs::read_dir(&output_dir)
        .unwrap()
        .map(|e| e.unwrap().path())
        .collect();
    let latest_output = outputs
        .iter()
        .max_by_key(|p| fs::metadata(p).unwrap().modified().unwrap())
        .unwrap();

    let content = fs::read_to_string(latest_output).unwrap();

    // Test enhanced binary file handling
    // Should either transcode Windows-1252 content or show binary placeholder
    assert!(
        content.contains("Hello") || content.contains("<Binary file"),
        "Should handle Windows-1252 encoding or show binary placeholder"
    );

    // Binary files should be handled gracefully (not crash the application)
    // The specific behavior depends on encoding strategy, but it should not fail

    // Test improved diff_only mode
    assert!(
        content.contains("## Change Summary"),
        "Should have change summary in diff_only mode"
    );

    // Should include full content of added files (new feature)
    assert!(
        content.contains("## Added Files"),
        "Should have Added Files section in diff_only mode"
    );
    assert!(
        content.contains("new_feature.rs"),
        "Should include added file"
    );
    assert!(
        content.contains("Brand new functionality"),
        "Should include full content of added file"
    );

    // Should have file differences for modified files
    assert!(
        content.contains("## File Differences"),
        "Should have file differences section"
    );

    // Should not have full Files section (due to diff_only mode)
    assert!(
        !content.contains("## Files\n"),
        "Should not have full Files section in diff_only mode"
    );

    // Test comprehensive edge cases are handled
    assert!(
        content.contains("# Directory Structure Report"),
        "Should have proper document structure"
    );
    assert!(
        content.contains("## File Tree Structure"),
        "Should have file tree"
    );

    // Verify that the enhanced features didn't break basic functionality
    // In diff_only mode, content is smaller since it only shows changes
    assert!(
        content.len() > 500,
        "Should generate reasonable content even in diff_only mode"
    );

    println!("✅ Phase 4 integration test passed!");
    println!("   - Enhanced binary file handling: Working");
    println!("   - Improved diff_only mode: Working");
    println!("   - Comprehensive edge case handling: Working");
    println!("   - All features integrated successfully");
}

#[test]
fn test_encoding_strategy_configuration() {
    let temp_dir = tempdir().unwrap();
    let project_dir = temp_dir.path().join("project");
    let output_dir = temp_dir.path().join("output");
    fs::create_dir_all(&output_dir).unwrap();

    // Create a file with Windows-1252 encoding
    let windows1252_data = [
        0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20, // "Hello "
        0x93, 0x57, 0x6F, 0x72, 0x6C, 0x64, 0x94, // "World" with smart quotes
        0x0A, // newline
    ];
    write_binary_file(&project_dir.join("test.txt"), &windows1252_data);

    let prompter = TestPrompter::new(true, true);

    // Test all encoding strategies
    for strategy in &["detect", "strict", "skip"] {
        let config = Config {
            encoding_strategy: Some(strategy.to_string()),
            ..Default::default()
        };

        let args = Args {
            input: project_dir.to_string_lossy().to_string(),
            output: output_dir
                .join(format!("encoding_{}.md", strategy))
                .to_string_lossy()
                .to_string(),
            filter: vec!["txt".to_string()],
            ignore: vec![],
            preview: false,
            token_count: false,
            line_numbers: false,
            yes: true,
            diff_only: false,
            clear_cache: false,
            init: false,
            max_tokens: None,
        };

        let result = run_with_args(args, config, &prompter);
        assert!(
            result.is_ok(),
            "Encoding strategy '{}' should work",
            strategy
        );

        let output_path = output_dir.join(format!("encoding_{}.md", strategy));
        let content = fs::read_to_string(&output_path).unwrap();

        match *strategy {
            "detect" => {
                // Should attempt transcoding and may succeed
                assert!(
                    content.contains("Hello") || content.contains("<Binary file"),
                    "Detect strategy should transcode or show binary placeholder"
                );
            }
            "strict" | "skip" => {
                // Should show binary placeholder
                assert!(
                    content.contains("<Binary file"),
                    "Strict/skip strategy should show binary placeholder"
                );
            }
            _ => {}
        }
    }

    println!("✅ Encoding strategy configuration test passed!");
}
```

### File: `BENCHMARKS.md`

- Size: 6024 bytes
- Modified: 2026-02-14 07:14:48 UTC

```markdown
# Benchmarks

This document explains how to run the Criterion benchmarks, how datasets are chosen/created, and how to generate persistent sample datasets for reproducible measurements.

The benchmark suite measures:
- Sequential vs parallel processing
- With and without line-numbered code blocks
- Multiple dataset sizes (tiny, small, optionally medium)

By default, runs are silent to avoid skewing timings with console I/O.

---

## Quick start

- Run (parallel by default):
  - Linux/macOS:
    - `cargo bench --bench context_bench`
  - Windows PowerShell:
    - `cargo bench --bench context_bench`

- Include the medium dataset (heavier, disabled by default):
  - Linux/macOS:
    - `CB_BENCH_MEDIUM=1 cargo bench --bench context_bench`
  - Windows PowerShell:
    - `$env:CB_BENCH_MEDIUM=1; cargo bench --bench context_bench`

- HTML reports:
  - Open: `target/criterion/report/index.html`
  - Or per-benchmark: `target/criterion/context_builder/*/report/index.html`

---

## Parallel vs sequential

Parallel processing is enabled by default via the `parallel` feature (rayon).

- Force sequential:
  - `cargo bench --no-default-features --bench context_bench`

- Force parallel (even if defaults change):
  - `cargo bench --features parallel --bench context_bench`

Note: Benchmarks compare both “line_numbers” and “no_line_numbers” modes. Line numbering does additional formatting work and is expected to be slower.

---

## Silence during benchmarks

Benchmarks set `CB_SILENT=1` once at startup so logs and prompts don’t impact timings.

- To see output during benchmarks:
  - Linux/macOS:
    - `CB_SILENT=0 cargo bench --bench context_bench`
  - Windows PowerShell:
    - `$env:CB_SILENT=0; cargo bench --bench context_bench`

Prompts are auto-confirmed inside benches, so runs are fully non-interactive.

---

## Dataset selection

Each scenario picks an input dataset with the following precedence:

1) If `./samples/<dataset>/project` exists, it is used.
2) Else, if `CB_BENCH_DATASET_DIR` is set, `<CB_BENCH_DATASET_DIR>/<dataset>/project` is used.
3) Else, a synthetic dataset is generated in a temporary directory for the run.

Datasets used:
- tiny: ~100 text files (fast sanity checks)
- small: ~1,000 text files (default performance checks)
- medium: ~5,000 text files (only when `CB_BENCH_MEDIUM=1` is set)

Default filters in the benches focus on text/code: `rs`, `md`, `txt`, `toml`. Common ignored directories: `target`, `node_modules`. Binary files are generated but skipped by filters.

---

## Reproducing results

For more stable and reproducible measurements:
- Generate persistent datasets into `./samples/` (see below).
- Keep your machine’s background activity low during runs.
- Run each scenario multiple times and compare Criterion reports.

---

## Generating persistent sample datasets

You have two options to generate datasets into `./samples`:

### Option A: Cargo bin (feature-gated)

The repository provides a generator binary gated behind the `samples-bin` feature.

- Linux/macOS:
  - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --help`
- Windows PowerShell:
  - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --help`

Examples:
- Generate default presets (tiny, small) into `./samples`:
  - `cargo run --no-default-features --features samples-bin --bin generate_samples`
- Include medium and large:
  - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --presets tiny,small,medium --include-large`
- Only one preset with custom parameters:
  - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --only small --files 5000 --depth 4 --width 4 --size 1024`
- Clean output before generating:
  - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --clean`
- Dry run (print plan only):
  - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --dry-run`

### Option B: Standalone compile with rustc

If you prefer not to use the Cargo feature gating, compile the script directly:

- Linux/macOS:
  - `rustc scripts/generate_samples.rs -O -o generate_samples && ./generate_samples --help`
- Windows PowerShell:
  - `rustc scripts/generate_samples.rs -O -o generate_samples.exe; .\generate_samples.exe --help`

Examples mirror Option A; just replace the leading command with `./generate_samples` (or `.\generate_samples.exe` on Windows).

---

## Directory layout of generated samples

The generator produces datasets under `./samples/<preset>/project`, which benches discover automatically.

Each `project` tree contains:
- `src/`, `docs/`, `assets/` with nested subdirectories and text files
- `target/`, `node_modules/` populated with noise (ignored by default)
- Top-level `README.md`, `Cargo.toml`
- Binary `.bin` files sprinkled to validate binary handling

It’s recommended to add `/samples` to `.gitignore` if not already present.

---

## Comparing modes

- Sequential vs Parallel:
  - Sequential (no rayon): `cargo bench --no-default-features --bench context_bench`
  - Parallel (rayon): `cargo bench --features parallel --bench context_bench`

- With vs Without line numbers:
  - Both modes are exercised in each run; consult the per-benchmark report pages for timings.

---

## Troubleshooting

- Benchmarks produce no output:
  - Expected. They run with `CB_SILENT=1`. Set `CB_SILENT=0` to see logs.
- Medium dataset missing:
  - Set the flag explicitly: `CB_BENCH_MEDIUM=1`.
  - Or pre-generate samples so the benches find `./samples/medium/project`.
- Reports are empty or unchanged:
  - Remove previous results and re-run:
    - `rm -rf target/criterion` (Linux/macOS)
    - `Remove-Item -Recurse -Force target\criterion` (Windows PowerShell)
- Sequential vs parallel deltas are small:
  - On tiny datasets, overheads dominate. Use small or medium for more signal.
  - Try enabling/disabling line numbers to observe formatting costs.

---

Happy benchmarking!
```

### File: `DEVELOPMENT.md`

- Size: 7600 bytes
- Modified: 2026-02-14 07:14:48 UTC

```markdown
# Development Guide

Welcome! This document is for contributors and maintainers of Context Builder. It covers how to set up a development environment, build, test, lint, benchmark, and release the project.

For user-facing documentation and examples, see README.md. For performance work, see BENCHMARKS.md. For release history, see CHANGELOG.md.

---

## Prerequisites

- Rust toolchain (stable) with support for the 2024 edition.
  - Install via rustup: https://rustup.rs
  - Keep your toolchain up-to-date: `rustup update`
- Git

Optional but recommended:
- IDE with Rust Analyzer
- Just or Make for local task automation (if you prefer)
- Node.js (only if you plan to view Criterion’s HTML reports and serve them locally, not required for development)

---

## Getting the code

```bash
git clone https://github.com/igorls/context-builder.git
cd context-builder
```

---

## Project layout

- Cargo.toml — crate metadata, dependencies, features
- README.md — user-facing documentation
- CHANGELOG.md — release notes
- DEVELOPMENT.md — this file
- BENCHMARKS.md — running and understanding benchmarks
- scripts/
  - generate_samples.rs — synthetic dataset generator for benchmarking
- benches/
  - context_bench.rs — Criterion benchmark suite
- src/
  - main.rs — binary entry point
  - lib.rs — core orchestration and run() implementation
  - cli.rs — clap parser and CLI arguments
  - file_utils.rs — directory traversal, filter/ignore collection, prompts
  - markdown.rs — core rendering logic, streaming, line numbering, binary/text sniffing
  - tree.rs — file tree structure building and printing
- samples/ — optional persistent datasets (ignored in VCS) for benchmarking

---

## Building and running

Build:
```bash
cargo build
```

Run the CLI:
```bash
cargo run -- --help
cargo run -- -d . -o out.md -f rs -f toml -i target --line-numbers
```

Notes:
- By default, parallel processing is enabled via the `parallel` feature (uses rayon).
- Logging uses env_logger; set `RUST_LOG` to control verbosity:
  - Linux/macOS: `RUST_LOG=info cargo run -- ...`
  - Windows PowerShell: `$env:RUST_LOG='info'; cargo run -- ...`

---

## Features

- parallel (enabled by default)
  - Enables parallel file processing in markdown generation via rayon.
  - Disable defaults (sequential run):
    - Build/Run: `cargo run --no-default-features -- ...`
    - As a dependency in another crate: set `default-features = false` in Cargo.toml.

- samples-bin
  - Exposes the dataset generator as a cargo bin (development-only).
  - Usage:
    - Linux/macOS:
      - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --help`
    - Windows PowerShell:
      - `cargo run --no-default-features --features samples-bin --bin generate_samples -- --help`

---

## Testing

Run all tests:
```bash
cargo test
```

Tips:
- Unit tests cover CLI parsing, file filtering/ignoring, markdown formatting (including line numbers and binary handling), and tree building.
- Avoid adding interactive prompts inside tests. The library is structured so that prompts can be injected/mocked (see `Prompter` trait).
- For additional diagnostics during tests:
  - Linux/macOS: `RUST_LOG=info cargo test`
  - Windows PowerShell: `$env:RUST_LOG='info'; cargo test`

---

## Linting and formatting

Format:
```bash
cargo fmt --all
```

Clippy (lints):
```bash
cargo clippy --all-targets --all-features -- -D warnings
```

Please ensure code is formatted and clippy-clean before opening a PR.

---

## Benchmarks

We use Criterion for micro/meso benchmarks and dataset-driven performance checks.

- See BENCHMARKS.md for details, including dataset generation, silent runs, and HTML report navigation.
- Quick start:
  ```bash
  cargo bench --bench context_bench
  ```

---

## Environment variables

- CB_SILENT
  - When set to “1” or “true” (case-insensitive), suppresses user-facing prints in the CLI.
  - The benchmark harness sets this to “1” by default to avoid skewing timings with console I/O.
  - Override locally:
    - Linux/macOS: `CB_SILENT=0 cargo bench --bench context_bench`
    - Windows PowerShell: `$env:CB_SILENT=0; cargo bench --bench context_bench`

- CB_BENCH_MEDIUM
  - When set to “1”, enables the heavier “medium” dataset scenarios during benches.

- CB_BENCH_DATASET_DIR
  - Allows pointing the benchmark harness to an external root containing datasets:
    - `<CB_BENCH_DATASET_DIR>/<preset>/project`
  - If not set and no `./samples/<preset>/project` is present, benches will synthesize datasets in a temp dir.

- RUST_LOG
  - Controls log verbosity (env_logger). Example:
    - Linux/macOS: `RUST_LOG=info cargo run -- ...`
    - Windows PowerShell: `$env:RUST_LOG='info'; cargo run -- ...`

---

## Coding guidelines

- Edition: 2024
- Error handling:
  - Use `io::Result` where appropriate; prefer returning errors over panicking.
  - It’s okay to use `unwrap()` and `expect()` in tests/benches and small setup helpers, but not in core library logic.
- Performance:
  - Prefer streaming reads/writes for large files (see `markdown.rs`).
  - Keep binary detection lightweight (current sniff logic checks for NUL bytes and UTF-8 validity).
  - Keep language detection simple and deterministic; add new mappings in one place.
- Cross-platform:
  - Normalize path separators in tests where string comparisons are used.
- Logging:
  - Use `log::{info, warn, error}`; let `env_logger` control emission.
- CLI:
  - Add new flags in `cli.rs`. Ensure you update tests covering parsing, and propagate options cleanly through `run_with_args`.

---

## Submitting changes

1) Fork and create a feature branch:
   ```bash
   git checkout -b feat/my-improvement
   ```

2) Make changes, add tests, and keep the code formatted and clippy-clean:
   ```bash
   cargo fmt --all
   cargo clippy --all-targets --all-features -- -D warnings
   cargo test
   ```

3) If you modified performance-sensitive code, run benches (see BENCHMARKS.md).

4) Update CHANGELOG.md if the change is user-visible or noteworthy.

5) Open a PR with:
   - A concise title
   - Description of changes and rationale
   - Notes on performance impact (if any)
   - Any relevant screenshots or benchmark snippets

Suggested commit message convention: short, imperative subject; optionally scope (e.g., `feat(cli): add --no-parallel flag`).

---

## Releasing (maintainers)

1) Ensure the tree is green:
   - `cargo fmt --all`
   - `cargo clippy --all-targets --all-features -- -D warnings`
   - `cargo test`
   - Optionally: `cargo bench`

2) Update versions and docs:
   - Bump `version` in `Cargo.toml`.
   - Add a new entry to `CHANGELOG.md`.
   - Verify README.md and DEVELOPMENT.md are up to date.

3) Tag the release:
   ```bash
   git commit -am "chore(release): vX.Y.Z"
   git tag vX.Y.Z
   git push && git push --tags
   ```

4) Publish to crates.io:
   ```bash
   cargo publish --dry-run
   cargo publish
   ```

5) Create a GitHub release, paste changelog highlights, and attach links to benchmarks if relevant.

---

## Tips and pitfalls

- Prompts during runs
  - The library uses a `Prompter` trait for confirmation flows. Inject a test-friendly prompter to avoid interactive I/O in tests and benches.
- Output file overwrites
  - The CLI confirms overwrites by default. In tests/benches, use the injected prompter that auto-confirms.
- Large datasets
  - Prefer parallel builds for performance.
  - Consider dataset size and line-numbering effects when measuring.

---

## Questions?

Open an issue or start a discussion on GitHub. Thanks for contributing!
```

### File: `LICENSE`

- Size: 1078 bytes
- Modified: 2026-02-14 07:14:48 UTC

```text
The MIT License

Copyright (c) 2025 Igor Lins e Silva

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
```

### File: `docs/research/multi-model-code-review-analysis.md`

- Size: 19787 bytes
- Modified: 2026-02-14 21:36:34 UTC

```markdown
# Multi-Model AI Code Review: 10 LLMs Analyze context-builder v0.7.0

> **Date**: February 14, 2026  
> **Project**: [context-builder](https://github.com/igorls/context-builder) v0.7.0  
> **Prompt**: Deep Think v2 — code review + relevance ordering evaluation  
> **Context file**: ~460KB, generated by context-builder itself (self-referential review)

## Abstract

We submitted context-builder v0.7.0's full source code to 10 AI models using a structured code review prompt. The context was generated by context-builder itself, creating a self-referential test: the tool's output quality directly affected the models' ability to review it. This exposed real bugs (e.g., lockfile starvation) that only manifested *because* the models consumed the tool's own output.

The models collectively identified **10 unique bugs**, achieved **universal consensus** on 4 architectural improvements, and proposed a clear roadmap for v2 output format. This document captures the full comparative analysis.

---

## 1. Methodology

### 1.1 Prompt Design

All models received the same structured prompt ("Deep Think v2") containing:
1. **Role**: Senior Rust developer and CLI tool architect
2. **Task**: Review context-builder v0.7.0 with focus on correctness, architecture, and a new relevance ordering feature
3. **Context**: The full output of `context-builder -d . --filter rs,toml,md` run against its own repository
4. **Specific asks**: Verify 5 bug fixes from v0.6.1→v0.7.0, evaluate relevance ordering, propose v2 output format

### 1.2 Model Access Methods

| Method | Models | Description |
|--------|--------|-------------|
| 🧠 **One-shot** | Gemini Deep Think, Gemini Pro, Grok 4.1, Qwen-3-Max, GLM-5 (×2), ChatGPT 5.2, Kimi K2.5 | Single prompt + context file, no tool access |
| 🤖 **Agentic** | Claude Opus 4.6, MiniMax Agent 2.5 | Multi-step sessions with file search, workspace tools, iterative reasoning |

> **Why this matters**: One-shot models that found deep bugs did so purely through reasoning over 460KB of context in a single pass. Agentic models could iteratively search, grep, and verify — giving them a structural advantage for precision but not necessarily for discovery depth.

### 1.3 Lockfile Fix (Mid-Experiment)

GLM-5's first run (pre-fix) was truncated because `Cargo.lock` (39KB) was classified as config (category 0), consuming the entire context window before source code appeared. We fixed this bug (moving lockfiles to category 4) and re-ran GLM-5, creating a natural A/B test of the fix's effectiveness.

Post-fix models (ChatGPT 5.2, Kimi K2.5, Claude Opus 4.6, MiniMax Agent, GLM-5 run 2) reviewed the corrected ordering.

---

## 2. Models Tested

| # | Model | Method | Context | Lockfile Fix | Quality |
|---|-------|--------|---------|-------------|---------|
| 1 | **Gemini 3 Deep Think** | 🧠 One-shot | Full | Pre-fix | 🟢 Excellent |
| 2 | **Gemini 3 Pro** | 🧠 One-shot | Full | Pre-fix | 🟢 Good |
| 3 | **Grok 4.1** | 🧠 One-shot | Full | Pre-fix | 🟡 Average |
| 4 | **Qwen-3-Max** | 🧠 One-shot | Full | Pre-fix | 🟢 Good |
| 5 | **GLM-5** (run 1, pre-fix) | 🧠 One-shot | Truncated | Pre-fix | 🔴 Truncated |
| 6 | **GLM-5** (run 2, post-fix) | 🧠 One-shot | Partial | Post-fix | 🟡 Partial |
| 7 | **ChatGPT 5.2** | 🧠 One-shot | Full | Post-fix | 🟢 Excellent |
| 8 | **Kimi K2.5** | 🧠 One-shot | Full | Post-fix | 🟢 Good |
| 9 | **Claude Opus 4.6** | 🤖 Agentic | Full | Post-fix | 🟢 Excellent |
| 10 | **MiniMax Agent 2.5** | 🤖 Agentic | Full | Post-fix | 🟢 Good |

> **Response files**: Each model's raw response is archived in [`docs/`](../):
> `context_v2_resp-gemini-3-deepthink.md`, `context_v2_resp-gemini-3-pro.md`, `context_v2_resp-grok-4.1.md`, `context_v2_resp-qwen-3-max.md`, `context_v2_resp-glm5.md`, `context_v2_resp-glm5-run2.md`, `context_v2_resp-chat-gpt-5.2.md`, `context_v2_resp-kimi-k2.5.md`, `context_v2_resp-claude-opus-4.6.md`, `context_v2_resp-minimax-agent.md`

---

## 3. Results: Model Rankings

### 3.1 Overall (Raw Output Quality)

| Rank | Model | Method | Unique Bugs | Novel Ideas | v0.6.1 Fix Verification |
|------|-------|--------|------------|-------------|------------------------|
| 🥇 | **Claude Opus 4.6** | 🤖 Agentic | 2 | Signatures-first format, `[category:N]` tags | 5/5 ✅ |
| 🥈 | **Gemini Deep Think** | 🧠 One-shot | 2 | XML CDATA format, BTreeMap ordering bug | 5/5 ✅ |
| 🥉 | **ChatGPT 5.2** | 🧠 One-shot | 1 | Tests-before-source ordering | 5/5 ✅ |
| 4th | **Qwen-3-Max** | 🧠 One-shot | 1 | Progressive disclosure | 5/5 ✅ |
| 5th | **MiniMax Agent 2.5** | 🤖 Agentic | 0 | XML w/ purpose summaries | 5/5 ✅ |
| 6th | **Kimi K2.5** | 🧠 One-shot | 0 | `--docs-first` flag, heat map | 5/5 ✅ |
| 7th | **Gemini Pro** | 🧠 One-shot | 0 | — | 5/5 ✅ |
| 8th | **GLM-5** (run 2) | 🧠 One-shot | 1 | Reliability column concept | 2/5 |
| 9th | **Grok 4.1** | 🧠 One-shot | 0 | — | 5/5 ✅ |
| 10th | **GLM-5** (run 1) | 🧠 One-shot | 1 (accidental) | — | 0/5 (truncated) |

### 3.2 Adjusted (Normalized for Methodology)

When accounting for the agentic advantage (tool access, iterative search, workspace), the efficiency picture changes:

| Rank | Model | Reasoning |
|------|-------|-----------|
| 🥇 | **Gemini Deep Think** 🧠 | Found 2 unique *architectural* bugs (BTreeMap ordering, DefaultHasher non-determinism) in a single reasoning pass — no tools, no iteration. The auto-diff BTreeMap bug was the deepest finding across all 10 models, found by no other. |
| 🥈 | **ChatGPT 5.2** 🧠 | Most original thinker. Proposed tests-before-source ordering that no other model considered. One unique bug. Entirely one-shot. |
| 🥉 | **Claude Opus 4.6** 🤖 | Highest raw quality, but had agentic advantage. Line-level citations and signatures-first v2 format are best-in-class, but required multi-step tool-assisted processing. |
| 4th | **Qwen-3-Max** 🧠 | Strong one-shot. Caught parallel `max_tokens` and header token omission bugs independently. |
| 5th | **Kimi K2.5** 🧠 | Solid one-shot with practical, implementable suggestions (`--docs-first` flag). |

**Key insight**: Deep Think matched Claude Opus in bug *discovery depth* with zero tool access — pure one-shot reasoning over 460KB of context. This suggests extended reasoning traces are competitive with agentic tool-use for code review tasks. The agentic advantage primarily manifests in citation precision (exact line numbers), not bug discovery.

---

## 4. Bug Matrix

### 4.1 All Bugs Found (10 unique)

| # | Bug | Severity | DT | GP | GR | QW | G5 | G5² | GPT | K2 | CO | MM | Consensus |
|---|-----|----------|----|----|----|----|----|----|-----|----|----|----|----|
| 1 | `max_tokens` ignored in parallel mode | 🔴 Critical | ✅ | | | ✅ | | | | ✅ | ✅ | | 4/10 |
| 2 | mtime hash ≠ content hash (breaks determinism) | 🔴 Critical | ✅ | ✅ | | ✅ | | | ✅ | | ✅ | ✅ | 6/10 |
| 3 | `DefaultHasher` non-deterministic across Rust versions | 🟡 High | ✅ | | | | | | ✅ | | ✅ | ✅ | 4/10 |
| 4 | Auto-diff `BTreeMap` destroys relevance ordering | 🔴 Critical | ✅ | | | | | | | | | | 1/10 |
| 5 | Header/tree tokens not counted in budget | 🟡 High | | | | ✅ | | | | | ✅ | | 2/10 |
| 6 | `contains("test")` substring false positives | 🟡 Medium | | | | ✅ | | | | ✅ | ✅ | | 3/10 |
| 7 | `strip_prefix('+')` incomplete for diff indentation | 🟢 Low | ✅ | | | | | | | | | | 1/10 |
| 8 | 4-byte/token estimate ~20% off for code | 🟡 High | | | | | | | ✅ | ✅ | ✅ | ✅ | 4/10 |
| 9 | Binary file content stored as `String` in cache | 🟢 Low | | | | | | ✅ | | | | | 1/10 |
| 10 | `starts_with("test_")` matches root-level helpers | 🟡 Medium | | | | | | | | | ✅ | | 1/10 |

**Legend**: DT=Deep Think, GP=Gemini Pro, GR=Grok, QW=Qwen, G5=GLM-5 run 1, G5²=GLM-5 run 2, GPT=ChatGPT 5.2, K2=Kimi K2.5, CO=Claude Opus, MM=MiniMax

### 4.2 Bug Descriptions

**#1 — `max_tokens` ignored in parallel mode** (4/10): The `--max-tokens` flag is only enforced in the sequential code path. When parallel processing is used, files are concatenated without budget enforcement, potentially exceeding the token limit.

**#2 — mtime hash ≠ content hash** (6/10): The cache uses `file.modified()` timestamp for hashing. A `git checkout` or `cp` changes mtime without changing content → different hash → broken prompt caching across machines, CI environments, or even consecutive runs.

**#3 — `DefaultHasher` non-deterministic** (4/10): Rust's `std::hash::DefaultHasher` explicitly does not guarantee stable output across compiler versions or architectures. This silently breaks cache invalidation when the binary is compiled with different toolchains.

**#4 — Auto-diff BTreeMap destroys relevance** (1/10, Deep Think only): When `auto_diff = true` and full file content is rendered, the code iterates over `BTreeMap<PathBuf, FileState>` — which is alphabetically ordered by path, completely overriding the relevance-based sort from `file_utils.rs`. Since `auto_diff = true` is the recommended config, **relevance ordering doesn't work in the primary usage path**.

**#5 — Header/tree tokens not in budget** (2/10): The file tree header and section headers consume tokens but aren't deducted from `max_tokens`, meaning the actual file content budget is smaller than specified.

**#6 — Test substring false positives** (3/10): `rel_str.contains("test")` matches files like `latest_results.rs` or `contest_entry.rs`. Fixed during this experiment by switching to path boundary matching.

**#7 — `strip_prefix('+')` incomplete** (1/10, Deep Think only): Unified diff format uses `+ code` (plus, space, code). `strip_prefix('+')` removes the `+` but leaves the leading space, causing indentation mismatch.

**#8 — 4-byte/token estimate inaccurate** (4/10): The hardcoded 4 bytes/token ratio is an average across natural language. Code (which has more symbols, shorter identifiers) typically runs closer to 3.2 bytes/token, making estimates ~20% off.

**#9 — Binary content as String** (1/10, GLM-5 run 2 only): Binary file content passed through `String` type in the cache layer, potentially causing encoding issues or silent corruption.

**#10 — `starts_with("test_")` matches root helpers** (1/10, Claude Opus only): Root-level test helper files (e.g., `test_utils.rs`) in the project root would be classified as test files even if they're shared helpers used by production code.

---

## 5. Consensus: Relevance Ordering

### 5.1 Universal Agreement

All 10 models agreed on these points:
- ✅ Config-first ordering is correct (10/10)
- ✅ Lockfiles should be last or excluded (10/10)
- ✅ Entry points (`main.rs`, `lib.rs`) should be elevated within source (9/10)
- ✅ Alphabetical ordering within categories is suboptimal (9/10)

### 5.2 Where Should Docs Go?

| Position | Models | Count |
|----------|--------|-------|
| Core docs FIRST (README before source) | Deep Think, ChatGPT 5.2, GLM-5², Claude Opus | 4 |
| Docs after config, before source | Qwen, Kimi K2.5, MiniMax | 3 |
| Docs LAST (current behavior) | Grok, Gemini Pro | 2 |
| Configurable via flag | Kimi K2.5 (`--docs-first`) | 1 |

**Winner (7/10)**: Core docs (README, AGENTS.md) should appear before source code.

### 5.3 Proposed Category System

Based on consensus across all models:

```
0 — Core Docs      (README.md, AGENTS.md, ARCHITECTURE.md)
1 — Config/Manifest (Cargo.toml, package.json, pyproject.toml)
2 — Build/CI        (.github/, Dockerfile, build.rs, Makefile)
3 — Source          (src/*, entry points elevated)
4 — Tests/Benches   (tests/*, benches/*)
5 — Other Docs      (CHANGELOG, DEVELOPMENT, etc.)
6 — Generated/Lock  (Cargo.lock, package-lock.json — or excluded)
```

### 5.4 Intra-Category Ordering

| Approach | Models Proposing | Feasibility |
|----------|-----------------|-------------|
| Entry points first (`main.rs`, `lib.rs`) | DT, GPT, K2, CO, G5², QW | Simple |
| Dependency/topological sort | DT, GPT, K2, CO, G5², MM | Medium |
| File size ascending | CO | Simple |
| Centrality score (most-imported) | CO, GPT | Medium-Large |

### 5.5 Novel Proposal: Tests Before Source

ChatGPT 5.2 proposed a radical reordering unique among all models:

```
config → public API source → TESTS → internal source → docs
```

**Rationale**: "LLM reasoning improves when expectation is known *before* implementation." Tests encode invariants, usage intent, and hidden contracts — reading them first allows deductive (not inductive) code comprehension.

---

## 6. Consensus: Tier 2 Features

| Feature | Models Proposing | Total |
|---------|-----------------|-------|
| **Dependency graph / module map** | DT, GP, GR, QW, G5, G5², GPT, K2, CO | **9/10** |
| **Signature-only / skeleton mode** | DT, GP, QW, G5, G5², K2, CO, MM | **8/10** |
| **Git-aware context / change heatmap** | DT, GP, GR, G5, G5², K2, MM | **7/10** |
| **Semantic chunking (AST-aware)** | QW, G5, K2, CO, MM | **5/10** |
| **Structured output (XML/JSON)** | DT, G5, MM | **3/10** |
| **Smart diff (move/rename detection)** | GR, CO, MM | **3/10** |
| **Cross-reference annotations** | QW, GPT, CO | **3/10** |
| **Interactive query mode** | DT, MM | **2/10** |

**Top 3 for next implementation cycle**:
1. 🥇 Dependency graph / module map (9/10 — near-universal consensus)
2. 🥈 Signature-only mode (8/10 — critical for token budget management)
3. 🥉 Git-aware context (7/10 — recent changes as relevance signal)

---

## 7. Consensus: Output Format v2

### 7.1 Format Preference

| Format | Models | Argument |
|--------|--------|----------|
| Enhanced Markdown | GPT, K2, CO, G5² | Human-readable AND LLM-friendly |
| XML with CDATA | DT, MM | Prevents code block inception, machine-parseable |
| Markdown default + `--format` flag | K2, MM | Backward compatible |

### 7.2 Structural Consensus

Every model independently proposed a v2 format following this general structure:

```
1. Project metadata (name, version, hash, token count)
2. Architecture overview (natural language + dependency graph)
3. File manifest table (path, category, size, tokens, purpose)
4. Optional: Public API / signatures section
5. Full file contents (with per-file metadata headers)
6. Truncation notice if budget exceeded
```

### 7.3 Most Innovative Proposals

| Innovation | Model | Impact |
|-----------|-------|--------|
| Signatures-first progressive disclosure | Claude Opus | ~40% token reduction for large files |
| `<![CDATA[]]>` for code content | Gemini Deep Think | Eliminates markdown-in-markdown inception |
| Per-file `[category:N]` tags | Claude Opus | Machine-parseable relevance metadata |
| Commit heatmap (🔥) | Kimi K2.5 | Visual frequency signal |
| Reliability column | GLM-5 run 2 | Inferred trust score per module |
| Centrality scores | Kimi K2.5 | Quantitative module importance |
| Tests-before-source | ChatGPT 5.2 | Deductive comprehension flow |

---

## 8. Methodology Analysis: Agentic vs One-Shot

### 8.1 Comparison

| Aspect | 🤖 Agentic (Opus, MiniMax) | 🧠 One-Shot (8 others) |
|--------|---------------------------|------------------------|
| Bug verification | Can search/grep for exact lines | Must infer from context window |
| Line citations | Exact (tool-verified) | Approximate or absent |
| Missing context | Can request more files | Must work with what's given |
| False positives | Lower (can verify claims) | Higher (inferring from memory) |
| Cost per review | Higher (many API calls) | Lower (single inference) |
| Latency | Minutes (multi-step) | Seconds to minutes |

### 8.2 Conclusion

For code review of context-builder's pre-packaged context files, **one-shot reasoning was surprisingly competitive with agentic sessions**. The key differentiator was not *finding* bugs (Deep Think matched Opus in depth with 2 unique architectural bugs each) but *citing* them precisely (Opus could grep for exact line numbers).

This has implications for context-builder itself: **if the context file is well-structured with good relevance ordering, one-shot models can perform at near-agentic quality** — which validates the tool's core value proposition.

---

## 9. Model Personality Profiles

| Model | Method | Style | Strength | Weakness |
|-------|--------|-------|----------|----------|
| **Claude Opus 4.6** | 🤖 | Surgical, evidence-based | Line-level citations, exact locations | Agentic overhead; verbose |
| **Gemini Deep Think** | 🧠 | Adversarial auditor | Deepest bugs in one shot (BTreeMap, DefaultHasher) | Slightly speculative on edge cases |
| **ChatGPT 5.2** | 🧠 | Systems thinker | Most original ideas (deductive ordering) | Brief on verification details |
| **Qwen-3-Max** | 🧠 | Technically precise | Thorough on new bugs, token math | Less creative on format proposals |
| **Kimi K2.5** | 🧠 | Pragmatic engineer | Balanced analysis, implementable suggestions | Less aggressive on bug hunting |
| **MiniMax Agent** | 🤖 | Academic reviewer | Thorough, systematic structure | Conservative despite tool access |
| **GLM-5** (run 2) | 🧠 | Context-limited analyst | Unique binary cache bug | Missing core source files |
| **Gemini Pro** | 🧠 | Balanced reviewer | Good verification of known fixes | Fewer unique insights |
| **Grok 4.1** | 🧠 | Conservative reviewer | Safe, accurate | Fewest bugs, least novel proposals |
| **GLM-5** (run 1) | 🧠 | Truncation victim | Accidentally exposed lockfile starvation bug | Couldn't review actual code |

---

## 10. Immediate Action Items (Prioritized)

| Priority | Fix | Models | Effort |
|----------|-----|--------|--------|
| 🔴 P0 | Implement `max_tokens` enforcement in parallel path | 4/10 | Medium |
| 🔴 P0 | Replace mtime hash with content hash | 6/10 | Small-Medium |
| 🔴 P0 | Replace `DefaultHasher` with stable hasher (blake3/xxhash) | 4/10 | Small |
| 🔴 P0 | Fix auto-diff BTreeMap ordering (use relevance-sorted Vec) | 1/10 | Small |
| 🟡 P1 | Elevate README/AGENTS.md to category 0 | 7/10 | Small |
| 🟡 P1 | Elevate entry points (main.rs, lib.rs) within source | 9/10 | Small |
| 🟡 P1 | Account for header/tree tokens in budget | 2/10 | Small |
| ✅ Done | Fix lockfile starvation (Cargo.lock → category 4) | 10/10 | Done |
| ✅ Done | Fix test substring false positives (path boundaries) | 3/10 | Done |

---

## Appendix A: Raw Response Files

| Model | Response File |
|-------|--------------|
| Gemini 3 Deep Think | [`context_v2_resp-gemini-3-deepthink.md`](../context_v2_resp-gemini-3-deepthink.md) |
| Gemini 3 Pro | [`context_v2_resp-gemini-3-pro.md`](../context_v2_resp-gemini-3-pro.md) |
| Grok 4.1 | [`context_v2_resp-grok-4.1.md`](../context_v2_resp-grok-4.1.md) |
| Qwen-3-Max | [`context_v2_resp-qwen-3-max.md`](../context_v2_resp-qwen-3-max.md) |
| GLM-5 (run 1) | [`context_v2_resp-glm5.md`](../context_v2_resp-glm5.md) |
| GLM-5 (run 2) | [`context_v2_resp-glm5-run2.md`](../context_v2_resp-glm5-run2.md) |
| ChatGPT 5.2 | [`context_v2_resp-chat-gpt-5.2.md`](../context_v2_resp-chat-gpt-5.2.md) |
| Kimi K2.5 | [`context_v2_resp-kimi-k2.5.md`](../context_v2_resp-kimi-k2.5.md) |
| Claude Opus 4.6 | [`context_v2_resp-claude-opus-4.6.md`](../context_v2_resp-claude-opus-4.6.md) |
| MiniMax Agent 2.5 | [`context_v2_resp-minimax-agent.md`](../context_v2_resp-minimax-agent.md) |

## Appendix B: Context File Used

- **Pre-fix context**: [`deepthink_context_v2.md`](../deepthink_context_v2.md) (original, Cargo.lock at position 0)
- **Post-fix context**: [`deepthink_context_v2_fixed.md`](../deepthink_context_v2_fixed.md) (lockfile fix applied, Cargo.lock at position last)
- **Prompt template**: [`deep_think_prompt_v2.md`](../deep_think_prompt_v2.md)
```

### File: `scripts/generate_samples.rs`

- Size: 16036 bytes
- Modified: 2026-02-14 07:14:48 UTC

```rust
#![allow(
    clippy::needless_return,
    clippy::extra_unused_lifetimes,
    clippy::doc_overindented_list_items,
    dead_code
)]
//! Dataset generation script for creating synthetic sample directories to benchmark and test
//! the context-builder CLI locally. This is intended to generate a folder that should be ignored
//! by version control (e.g., add `/samples` to your project's .gitignore).
//!
//! Usage examples (Windows PowerShell):
//!   - rustc scripts/generate_samples.rs -O -o generate_samples.exe; .\generate_samples.exe
//!   - .\generate_samples.exe --help
//!
//! Flags:
//!   --out <DIR>             Output directory (default: ./samples)
//!   --presets <list>        Comma-separated presets to generate: tiny,small,medium (default: tiny,small)
//!   --include-large         Also generate the large preset (off by default)
//!   --only <name>           Only generate a single preset (overrides --presets)
//!   --clean                 Remove the output directory before generating
//!   --dry-run               Print the plan without writing files
//!
//! Advanced overrides (apply when using --only):
//!   --files <N>             Number of text files
//!   --binary-every <N>      Create one .bin file every N text files (0 disables)
//!   --depth <D>             Directory tree depth
//!   --width <W>             Subdirectories per level
//!   --size <BYTES>          Approx text file size in bytes
//!   --filters <CSV>         Extensions to include (default: rs,md,txt,toml)
//!   --ignores <CSV>         Directory/file names to ignore (default: target,node_modules)
//!
//! Generated structure per dataset (e.g., samples/small):
//!   - project/
//!       src/, docs/, assets/      -> nested trees with text files
//!       target/, node_modules/    -> ignored directories with noise
//!       README.md, Cargo.toml     -> top-level files
//!       (binary files are sprinkled across trees and should be ignored by the tool)
//!
//! Notes:
//! - Binary files are generated to validate that the CLI ignores them by default filters.
//! - This script uses only the Rust standard library.

use std::env;
use std::fs::{self, File};
use std::io::{self, Write};
use std::path::{Path, PathBuf};

#[derive(Clone, Debug)]
struct DatasetSpec {
    name: String,
    text_files: usize,
    binary_every: usize,
    depth: usize,
    width: usize,
    text_file_size: usize,
    filters: Vec<String>,
    ignores: Vec<String>,
}

impl DatasetSpec {
    fn with_name(name: &str) -> Option<Self> {
        match name {
            "tiny" => Some(Self {
                name: "tiny".into(),
                text_files: 100,
                binary_every: 10,
                depth: 2,
                width: 3,
                text_file_size: 256,
                filters: default_filters(),
                ignores: default_ignores(),
            }),
            "small" => Some(Self {
                name: "small".into(),
                text_files: 1_000,
                binary_every: 20,
                depth: 3,
                width: 4,
                text_file_size: 512,
                filters: default_filters(),
                ignores: default_ignores(),
            }),
            "medium" => Some(Self {
                name: "medium".into(),
                text_files: 5_000,
                binary_every: 25,
                depth: 4,
                width: 4,
                text_file_size: 800,
                filters: default_filters(),
                ignores: default_ignores(),
            }),
            "large" => Some(Self {
                name: "large".into(),
                text_files: 20_000,
                binary_every: 50,
                depth: 5,
                width: 5,
                text_file_size: 1024,
                filters: default_filters(),
                ignores: default_ignores(),
            }),
            _ => None,
        }
    }
}

fn default_filters() -> Vec<String> {
    vec!["rs", "md", "txt", "toml"]
        .into_iter()
        .map(|s| s.to_string())
        .collect()
}

fn default_ignores() -> Vec<String> {
    vec!["target", "node_modules"]
        .into_iter()
        .map(|s| s.to_string())
        .collect()
}

#[derive(Default)]
struct Args {
    out: PathBuf,
    presets: Vec<String>,
    include_large: bool,
    only: Option<String>,
    clean: bool,
    dry_run: bool,
    // overrides for --only
    files: Option<usize>,
    binary_every: Option<usize>,
    depth: Option<usize>,
    width: Option<usize>,
    size: Option<usize>,
    filters: Option<Vec<String>>,
    ignores: Option<Vec<String>>,
}

fn parse_args() -> Args {
    let mut out = PathBuf::from("samples");
    let mut presets: Vec<String> = vec!["tiny".into(), "small".into()];
    let mut include_large = false;
    let mut only: Option<String> = None;
    let mut clean = false;
    let mut dry_run = false;

    let mut files: Option<usize> = None;
    let mut binary_every: Option<usize> = None;
    let mut depth: Option<usize> = None;
    let mut width: Option<usize> = None;
    let mut size: Option<usize> = None;
    let mut filters: Option<Vec<String>> = None;
    let mut ignores: Option<Vec<String>> = None;

    let mut it = env::args().skip(1).peekable();
    while let Some(arg) = it.next() {
        match arg.as_str() {
            "--out" => {
                out = PathBuf::from(expect_value("--out", &mut it));
            }
            "--presets" => {
                presets = parse_csv(expect_value("--presets", &mut it));
            }
            "--include-large" => include_large = true,
            "--only" => {
                only = Some(expect_value("--only", &mut it).to_lowercase());
            }
            "--clean" => clean = true,
            "--dry-run" => dry_run = true,

            // overrides (effective with --only)
            "--files" => files = parse_usize(expect_value("--files", &mut it)),
            "--binary-every" => binary_every = parse_usize(expect_value("--binary-every", &mut it)),
            "--depth" => depth = parse_usize(expect_value("--depth", &mut it)),
            "--width" => width = parse_usize(expect_value("--width", &mut it)),
            "--size" => size = parse_usize(expect_value("--size", &mut it)),
            "--filters" => filters = Some(parse_csv(expect_value("--filters", &mut it))),
            "--ignores" => ignores = Some(parse_csv(expect_value("--ignores", &mut it))),
            "--help" | "-h" => {
                print_help();
                std::process::exit(0);
            }
            other => {
                eprintln!("Unknown argument: {}", other);
                print_help();
                std::process::exit(2);
            }
        }
    }

    if include_large && !presets.iter().any(|p| p == "large") {
        presets.push("large".into());
    }

    Args {
        out,
        presets,
        include_large,
        only,
        clean,
        dry_run,
        files,
        binary_every,
        depth,
        width,
        size,
        filters,
        ignores,
    }
}

fn expect_value<'a, I>(flag: &str, it: &mut I) -> String
where
    I: Iterator<Item = String>,
{
    if let Some(v) = it.next() {
        v
    } else {
        eprintln!("{flag} requires a value");
        std::process::exit(2);
    }
}

fn parse_usize(s: String) -> Option<usize> {
    match s.parse::<usize>() {
        Ok(v) => Some(v),
        Err(_) => {
            eprintln!("Invalid number: {}", s);
            std::process::exit(2);
        }
    }
}

fn parse_csv(s: String) -> Vec<String> {
    s.split(',')
        .map(|x| x.trim().to_string())
        .filter(|x| !x.is_empty())
        .collect()
}

fn print_help() {
    println!(
        r#"generate_samples - generate synthetic datasets for benchmarking

Usage:
  generate_samples [--out DIR] [--presets CSV] [--include-large]
                   [--only NAME] [--clean] [--dry-run]
                   [--files N] [--binary-every N] [--depth D] [--width W]
                   [--size BYTES] [--filters CSV] [--ignores CSV]

Examples:
  # Default (tiny, small) into ./samples
  generate_samples

  # Include medium and large
  generate_samples --presets tiny,small,medium --include-large

  # Only 'small' with custom parameters
  generate_samples --only small --files 5000 --depth 4 --width 4 --size 1024

  # Clean output directory before generating
  generate_samples --clean

  # Dry-run (show plan, don't write)
  generate_samples --dry-run
"#
    );
}

fn write_text_file(path: &Path, bytes: usize) -> io::Result<()> {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent)?;
    }
    let mut f = File::create(path)?;
    // Deterministic multi-line content ~40 bytes per line
    let line = b"let x = 42; // benchmark content line\n";
    let mut written = 0usize;
    while written + line.len() <= bytes {
        f.write_all(line)?;
        written += line.len();
    }
    if written < bytes {
        let remaining = &line[..(bytes - written).min(line.len())];
        f.write_all(remaining)?;
        written += remaining.len();
    }
    // Ensure trailing newline for nicer line-numbered output
    if written == 0 || !path.to_string_lossy().ends_with('\n') {
        f.write_all(b"\n")?;
    }
    Ok(())
}

fn write_binary_file(path: &Path, bytes: usize) -> io::Result<()> {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent)?;
    }
    let mut f = File::create(path)?;
    // Simple reproducible byte pattern
    for i in 0..bytes {
        let b = ((i as u8).wrapping_mul(31)).wrapping_add(7);
        f.write_all(&[b])?;
    }
    Ok(())
}

fn make_nested_dirs(base: &Path, depth: usize, width: usize) -> io::Result<Vec<PathBuf>> {
    let mut dirs = vec![base.to_path_buf()];
    for d in 1..=depth {
        let mut next = Vec::new();
        for parent in &dirs {
            for w in 0..width.max(1) {
                let child = parent.join(format!("d{}_{}", d, w));
                fs::create_dir_all(&child)?;
                next.push(child);
            }
        }
        dirs.extend(next);
    }
    Ok(dirs)
}

fn write_string(path: &Path, s: &str) -> io::Result<()> {
    if let Some(parent) = path.parent() {
        fs::create_dir_all(parent)?;
    }
    let mut f = File::create(path)?;
    f.write_all(s.as_bytes())
}

fn generate_dataset(root: &Path, spec: &DatasetSpec, dry_run: bool) -> io::Result<()> {
    let dataset_dir = root.join(&spec.name);
    let project_dir = dataset_dir.join("project");
    let src_dir = project_dir.join("src");
    let docs_dir = project_dir.join("docs");
    let assets_dir = project_dir.join("assets");
    let ignored_target = project_dir.join("target");
    let ignored_node_modules = project_dir.join("node_modules");

    println!(
        "- [{}] files={}, bin_every={}, depth={}, width={}, size={}, filters={:?}, ignores={:?}",
        spec.name,
        spec.text_files,
        spec.binary_every,
        spec.depth,
        spec.width,
        spec.text_file_size,
        spec.filters,
        spec.ignores
    );

    if dry_run {
        return Ok(());
    }

    fs::create_dir_all(&src_dir)?;
    fs::create_dir_all(&docs_dir)?;
    fs::create_dir_all(&assets_dir)?;
    fs::create_dir_all(&ignored_target)?;
    fs::create_dir_all(&ignored_node_modules)?;

    // Write dataset README and .gitignore to discourage accidental commits
    write_string(
        &dataset_dir.join("README.txt"),
        &format!(
            "Synthetic dataset '{}'\n\
             - Generated by scripts/generate_samples.rs\n\
             - Intended for local benchmarking and testing\n\
             - May be large; avoid committing this folder\n",
            spec.name
        ),
    )?;
    write_string(
        &dataset_dir.join(".gitignore"),
        "*\n!.gitignore\n!README.txt\n",
    )?;

    let mut all_dirs = Vec::new();
    all_dirs.extend(make_nested_dirs(&src_dir, spec.depth, spec.width)?);
    all_dirs.extend(make_nested_dirs(&docs_dir, spec.depth, spec.width)?);
    all_dirs.extend(make_nested_dirs(&assets_dir, spec.depth, spec.width)?);

    // Distribute text files across dirs with round-robin extensions
    let text_exts = ["rs", "md", "txt", "toml"];
    let mut created = 0usize;
    let mut bin_counter = 0usize;

    'outer: for dir in &all_dirs {
        for i in 0..spec.width.max(1) {
            if created >= spec.text_files {
                break 'outer;
            }
            let ext = text_exts[created % text_exts.len()];
            let path = dir.join(format!("f{}_{}.{}", created, i, ext));
            write_text_file(&path, spec.text_file_size)?;
            created += 1;

            if spec.binary_every > 0 {
                bin_counter += 1;
                if bin_counter.is_multiple_of(spec.binary_every) {
                    let bpath = dir.join(format!("bin_{}_{}.bin", created, i));
                    write_binary_file(&bpath, 2048)?;
                }
            }
        }
    }

    // Populate ignored directories with content that should be skipped by the tool
    write_text_file(&ignored_target.join("ignored.rs"), spec.text_file_size)?;
    write_text_file(
        &ignored_node_modules.join("ignored.js"),
        spec.text_file_size,
    )?;

    // Top-level files
    write_text_file(&project_dir.join("README.md"), spec.text_file_size)?;
    write_text_file(&project_dir.join("Cargo.toml"), spec.text_file_size)?;

    Ok(())
}

fn apply_overrides(spec: &mut DatasetSpec, args: &Args) {
    if let Some(v) = args.files {
        spec.text_files = v;
    }
    if let Some(v) = args.binary_every {
        spec.binary_every = v;
    }
    if let Some(v) = args.depth {
        spec.depth = v;
    }
    if let Some(v) = args.width {
        spec.width = v;
    }
    if let Some(v) = args.size {
        spec.text_file_size = v;
    }
    if let Some(v) = args.filters.clone() {
        spec.filters = v;
    }
    if let Some(v) = args.ignores.clone() {
        spec.ignores = v;
    }
}

fn main() -> io::Result<()> {
    let args = parse_args();

    if args.clean && args.out.exists() && !args.dry_run {
        println!("Cleaning output directory: {}", args.out.display());
        fs::remove_dir_all(&args.out)?;
    }

    println!("Output directory: {}", args.out.display());
    println!("Dry run: {}", args.dry_run);

    let mut specs: Vec<DatasetSpec> = Vec::new();

    if let Some(name) = args.only.clone() {
        let mut spec = DatasetSpec::with_name(&name).unwrap_or_else(|| {
            eprintln!("Unknown preset for --only: {}", name);
            std::process::exit(2);
        });
        apply_overrides(&mut spec, &args);
        specs.push(spec);
    } else {
        for p in &args.presets {
            if let Some(spec) = DatasetSpec::with_name(p) {
                specs.push(spec);
            } else {
                eprintln!("Unknown preset: {}", p);
                std::process::exit(2);
            }
        }
    }

    if args.dry_run {
        println!("Planned datasets:");
        for s in &specs {
            println!(
                "  - {}: files={}, bin_every={}, depth={}, width={}, size={}",
                s.name, s.text_files, s.binary_every, s.depth, s.width, s.text_file_size
            );
        }
        return Ok(());
    }

    fs::create_dir_all(&args.out)?;
    // Guard .gitignore at the root samples folder
    let root_gitignore = args.out.join(".gitignore");
    if !root_gitignore.exists() {
        write_string(&root_gitignore, "*\n!.gitignore\n")?;
    }

    for spec in specs {
        generate_dataset(&args.out, &spec, false)?;
    }

    println!("Done.");
    Ok(())
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_expect_value() {
        let mut it = vec!["--out".to_string(), "samples".to_string()].into_iter();
        let flag = it.next().unwrap();
        assert_eq!(flag, "--out");
        let value = expect_value(&flag, &mut it);
        assert_eq!(value, "samples");
    }
}
```