asrch 0.1.0

Agent-safe bounded code search CLI
# CLI Behavior

[日本語](cli_behavior.md)

This document defines the external behavior of `asrch`. The project goals and non-goals are defined in [goals_and_condition.en.md](goals_and_condition.en.md).

## Search Protocol

Agents should not combine multiple terms into a broad OR regex and then read raw matches. They should explore in this order:

1. Use `survey` to compare multiple candidate terms.
2. Choose one useful term and use `scout` to inspect its distribution.
3. Narrow the path and use `sample` to view representative nearby match clusters.
4. Select a file and use `show` for a small number of focused snippets.

## Commands

| Command | Input | Purpose | Default Output Budget |
| --- | --- | --- | --- |
| `survey` | Multiple fixed terms and candidate paths | Compare per-term match counts, file counts, and promising paths | 20 lines / 4,000 bytes |
| `scout` | Single query | Summarize match count, file count, top directories, and top files | 15 lines / 4,000 bytes |
| `sample` | Single query | Cluster nearby matches and show representative short context | 20 lines / 6,000 bytes |
| `show` | Single query and explicit file | Show small snippets with context | 30 lines / 8,000 bytes |

`count` and `terms` are not provided. `count` overlaps with `scout`, and simple frequency-based `terms` output tends to recommend noisy follow-up terms when the initial search is broad.

All commands have hard caps of 40 output lines and 8,000 output bytes. `show` context is capped at 5 lines.

## Multiple Terms and OR

`survey` accepts up to 12 `--term` values and up to 8 candidate paths. It searches each term/path pair independently and does not print match bodies.

`survey` uses a compact TOON-style output. It includes both `overall[term,matches,files,dominant_path]` and `by_path`. `overall` shows the total match count, total file count, and most active path for each term. `by_path` shows the per-path term distribution and omits zero-match rows. This lets agents choose the next term and path without re-aggregating the result in reasoning.

`scout` also uses TOON-style output. Metadata is under `scout:`, and distribution rows are emitted as `top_directories[path,matches]` and `top_files[path,matches]`. `sample` and `show` keep their snippet-oriented format for readability.

If one term accounts for at least 80% of all matching lines, `survey` warns that the term dominates the comparison. In fixed-string mode, terms of 3 characters or fewer produce a warning suggesting `--identifier` or `--word`.

`scout`, `sample`, and `show` reject queries containing an unescaped `|` when `--regex` is used. The CLI does not try to fully parse arbitrary regexes and split OR expressions into search plans.

## Search Modes

Queries are fixed strings by default.

- No option: fixed-string substring search
- `--identifier`: ASCII identifier-boundary fixed-string search
- `--word`: word-boundary fixed-string search
- `--regex`: explicit regex search; not accepted by `survey`

Prefer `--identifier` or `--word` for short or common terms to avoid unintended partial matches.
Empty queries are rejected.

## `sample` Selection

`sample` is deterministic; it does not use random sampling.

1. Group matches by file.
2. Treat matches within 2 lines as one cluster.
3. Use the first match in each cluster as its representative.
4. For files with many clusters, prefer the first, middle, and last clusters.
5. Round-robin across files and show one line of context around each representative match.

`sample` does not accept `--context`. To see more context, narrow to a file and then use `show <query> <file> --context N`.

## `show` Constraints

`show` accepts only an explicit file path and rejects directories. If there are more than 20 matching lines, or if the internal scan limit is reached, `show` refuses to print snippets and asks the agent to narrow the query.

`show` writes output by snippet. It checks the output budget before starting the next snippet, but once a snippet has started it is printed to completion even if the configured line budget is slightly exceeded by `--context`.

## Broad Searches

If a query has more than 1,000 matching lines or more than 100 matching files, the command reports that the query is broad. If the internal scan limit is reached, scanning stops and the count is reported as `at least`.

Agents should narrow the query or path instead of increasing output limits.

## Default Excludes

Noisy targets are excluded by default through `rg` globs and ignore rules:

- `.git`, `target`, `node_modules`, `vendor`
- `dist`, `build`, `coverage`, `generated`
- `scratch`, `tmp`
- `*.log`, `*.jsonl`, `*.xml`, `*.min.js`, `*.map`

There is no CLI option to disable these default excludes. The default favors safe agent exploration over easy access to noisy generated output.