precis 0.1.0 - Docs.rs

# symbols

A CLI tool that extracts symbols from a directory and returns a token-efficient breakdown, designed for feeding codebase context into LLM prompts.

## Design

This section describes the target design. See `issues.md` for current implementation status and remaining work.

### Parsing

Uses **tree-sitter** for language-agnostic symbol extraction. Each supported language has a small query file (`.scm`) that maps language-specific node types to a common symbol model (functions, types, modules, exports, etc.).

### Token Budgeting

Takes a `--budget` flag (in words).

**Granularity function:** A single function `render(level, path, content) -> output` maps each file to its output at a given granularity level. The function is depth-aware (via `path`) and file-size-aware (via `content`), so a single level can behave differently for shallow vs deep files or small vs large files. A `MAX_LEVEL` constant defines the highest available level.

**Monotonicity invariant:** For any given file, a higher level must never produce fewer words than a lower level. This is tested against fixture files across all levels.

**Budget algorithm:** Binary search over `0..=MAX_LEVEL` to find the highest level where the total word count across all files fits within the budget.

**Starting levels** (expected to evolve — levels differ in which lines are included and how much of each line is shown, but all file content is line-prefixes with line numbers):

0. File paths only
1. Symbol lines, truncated to symbol name (e.g. `pub fn new`)
2. Symbol lines, full line-prefix including signature (e.g. `pub fn new(lang: Language) -> Self {`)
3. Full source (all lines)

Intermediate levels can be added over time (e.g. multi-line signatures, docstrings).

Shallower files are prioritized over deeper ones when budget is tight. Users can zoom into subdirectories by running the tool on them directly with a larger budget.

### Output Format

Plain text / markdown, optimized for direct use in LLM prompts.

**Line-prefix constraint:** Each output line is a prefix of the actual line in the source file, preserving original whitespace and indentation. The tool extracts from the source rather than synthesizing new representations — no cross-language normalization of keywords. This means nesting (e.g. methods inside a Rust `impl` block) is represented naturally by the source's own indentation.

Line numbers use a right-aligned format with an arrow separator. At name-only level:

```
src/parser.rs
     1→impl Parser
    12→    pub fn new
    45→    pub fn parse
```

At signature level:

```
src/parser.rs
     1→impl Parser {
    12→    pub fn new(lang: Language) -> Self {
    45→    pub fn parse(&self, source: &str) -> Tree {
```

A `--json` flag may be added later for machine consumption.

## Supported Languages

Per-language support requires a tree-sitter grammar and a query file defining:
- Which node types count as symbols
- How to extract signatures
- What signals "public" (e.g., `export` in JS, `pub` in Rust, capitalization in Go)