repo-mapper 0.0.1

Token-budget-respecting repository map generator — ranked structural summary of a codebase
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
# SPEC.md — repo-mapper Behavioral Specification

**Status:** Draft — Reverse-engineered from reference implementations  
**Sources:** `reference/aider/aider/repomap.py` (primary), `reference/RepoMapper/` (secondary)  
**RFC 2119 keywords apply throughout.**

---

## 1. Overview

repo-mapper is a library that produces a compact, token-budget-respecting textual summary
(the "repo map") of a source code repository. It identifies the most structurally relevant
files and definitions given a set of "chat" files (the ones a user is actively editing) and
a set of "other" files (the rest of the repository).

The pipeline is:

1. **Tag extraction** — parse each source file with tree-sitter to extract definition and reference tags.
2. **Graph construction** — build a weighted directed multi-graph of file-to-file relationships via shared identifiers.
3. **Ranking** — run PageRank on that graph, personalized toward chat-relevant files.
4. **Rendering** — convert the ranked tags into a human-readable text output.
5. **Budget enforcement** — binary-search for the largest subset of ranked tags that fits within the token limit.

---

## 2. Data Types

### 2.1 Tag

A **Tag** MUST carry exactly the following fields:

| Field       | Type   | Description                                                       |
|-------------|--------|-------------------------------------------------------------------|
| `rel_fname` | string | Path of the file relative to the repository root                  |
| `fname`     | string | Absolute path to the file                                         |
| `line`      | int    | Source line where the identifier appears (see §3.4 for indexing)  |
| `name`      | string | The identifier text                                               |
| `kind`      | string | Either `"def"` (definition) or `"ref"` (reference)               |

### 2.2 Path Representations

The spec uses two tiers of file paths consistently throughout:

- **Absolute path** (`fname`): the OS-native absolute path to the file. Used as cache keys (§5.3) and for reading file content.
- **Relative path** (`rel_fname`): the path relative to the repository root, computed as `relpath(fname, root)`. Used as graph node identifiers, map output labels, and ranking keys.

The `rel_fname` for any file MUST be computed as the relative path from `root`. On platforms where `relpath` raises an error (e.g., Windows cross-drive paths), the implementation MUST fall back to the absolute path unchanged.

`chat_rel_fnames` is the set of relative paths derived from `chat_fnames`.
`other_rel_fnames` is the set of relative paths derived from `other_fnames`.

Before processing, the implementation MUST compute the combined file set as `set(chat_fnames) ∪ set(other_fnames)` (deduplicated). Files present in both sets are processed once; any file in `chat_fnames` is treated as a chat file regardless of whether it also appears in `other_fnames`. The combined set MUST be sorted lexicographically before tag extraction begins, for deterministic processing order.

### 2.3 FileEntry

During graph construction, files are represented as their `rel_fname` string (nodes in the graph).

---

## 3. Tag Extraction

### 3.1 Per-file extraction

For each source file the implementation MUST:

1. Determine the file's language from its filename (using a language-detection lookup table equivalent to `grep-ast`'s `filename_to_lang`).
2. If the language is unrecognized, produce no tags and skip the file silently.
3. Load the corresponding tree-sitter parser for that language. If the parser is unavailable, skip the file.
4. Load the `.scm` query file for that language (see §4). If no `.scm` exists for the language, produce no tags and skip the file.
5. Read the file's text content as UTF-8. If the file is empty or unreadable, produce no tags.
6. Parse the source text. Execute the query using a `QueryCursor`, passing the source bytes
   as the text provider (`QueryCursor::captures(query, root_node, source_bytes)`). Iterate
   captures in source order; each iteration yields a `(QueryMatch, capture_index)` pair.
   Resolve the capture name as `query.capture_names()[capture.index]`.
7. For each capture whose name begins with `"name.definition."`, emit a `Tag` with `kind = "def"`.
8. For each capture whose name begins with `"name.reference."`, emit a `Tag` with `kind = "ref"`.
9. All other capture names MUST be ignored (not emitted as tags).
10. The tag's `name` field MUST be the UTF-8 decoded text of the captured tree-sitter node.

### 3.2 Pygments reference fallback

If tree-sitter extraction for a file produces **at least one definition tag and zero reference tags**, the implementation MUST apply a tree-sitter identifier fallback to backfill reference tags:

1. Locate a supplemental identifier query file for the language using the same search path as
   the primary tag query (§4.1), but with the suffix `-idents.scm` instead of `-tags.scm`.
   For example: `queries/tree-sitter-language-pack/typescript-idents.scm`.
   If no `-idents.scm` file exists for the language, the fallback MUST be silently skipped —
   definition tags already produced are retained and no reference tags are emitted.
2. Execute the identifier query against the same already-parsed AST root node.
   Captures in the `-idents.scm` file MUST use `name.reference.*` capture names; any other
   capture name MUST be ignored (same dispatch rule as §3.1 steps 7–9).
3. For each matched capture, emit a `Tag` with:
   - `kind = "ref"`
   - `name` = the UTF-8 decoded text of the node
   - `line = -1` (sentinel: no meaningful line number)
4. The backfilled reference tags are appended after the definition tags for this file.

If tree-sitter produces **no definitions and no references**, the identifier fallback MUST NOT be applied.

If tree-sitter produces **references** (with or without definitions), the identifier fallback MUST NOT be applied.

[DESIGN NOTE: The reference implementation uses Python Pygments' `Token.Name` family for this
fallback. The Rust implementation replaces this with a supplemental tree-sitter `-idents.scm`
query on the already-parsed AST, eliminating the Pygments runtime dependency. This is the
primary mechanism ensuring TypeScript files participate correctly in graph construction: the
TypeScript tag query has no `name.reference.call` capture (§4.3), so TypeScript files always
trigger this fallback. The TypeScript `-idents.scm` MUST capture `identifier`, `type_identifier`,
and `property_identifier` nodes as `name.reference.identifier`.]
### 3.3 Non-file inputs

If a path does not resolve to a regular file (e.g., directory, symlink to non-file, or missing), the implementation MUST skip it, emit a warning to the caller once per unique path, and produce no tags for it.

### 3.4 Line number indexing

Line numbers MUST be 1-indexed. The `line` field of a `Tag` MUST store the 1-indexed line
number of the captured node (i.e., `node.start_point[0] + 1` in tree-sitter terms).

[INTENTIONAL DEVIATION FROM REFERENCE: `repomap.py` stores `node.start_point[0]` with no
`+ 1` (0-indexed). The Rust implementation uses 1-indexed line numbers. This is a deliberate
improvement for user-facing output consistency.]

The `line` field for identifier-fallback reference tags MUST be `-1` (sentinel: no meaningful line number).

### 3.5 Known bug in reference implementation: captures-processing

[KNOWN BUG IN REFERENCE: `repomap.py` lines 304–315 have a loop indentation bug. Two statements are placed outside the inner `for node in nodes` loop: `captures_by_tag[tag].append(node)` and `matches.append((node, tag))`. This produces two failure modes: (1) Non-TSL_PACK path — `matches` only collects the last `(node, tag)` per capture-type name; all earlier nodes for the same capture type are silently discarded (e.g., a file with 10 class definitions produces exactly 1 definition tag). (2) TSL_PACK path — `captures_by_tag[tag]` receives an extra duplicate of the last node outside the inner loop, generating one duplicate tag per capture type. Intended behavior: emit exactly one `Tag` per captured node — no data loss and no duplicates. The Rust implementation MUST NOT replicate either bug.]
---

## 4. Language and Query Support

### 4.1 Query file resolution

For a given language identifier, the implementation MUST locate the corresponding `.scm`
query file. Two search paths are supported (in priority order):

1. `queries/tree-sitter-language-pack/{lang}-tags.scm`
2. `queries/tree-sitter-languages/{lang}-tags.scm`

If neither path resolves to an existing file, the language has no query support and MUST be skipped.

Note: the `.scm` tag query files are bundled by this implementation itself. The Rust tree-sitter
parser crate (`tree-sitter-language-pack` or equivalent) provides parsers only; it does NOT
supply `name.definition.*` / `name.reference.*` tag queries.

### 4.2 Supported languages (query coverage)

The following language identifiers have known query coverage based on the reference files.
This list is informational; the actual runtime set depends on which `.scm` files are bundled:

`arduino`, `c`, `chatito`, `commonlisp`, `cpp`, `csharp`, `d`, `dart`, `elisp`,
`elixir`, `elm`, `gleam`, `go`, `hcl`, `java`, `javascript`, `kotlin`, `lua`, `ocaml`,
`ocaml_interface`, `php`, `pony`, `properties`, `python`, `ql`, `r`, `racket`, `ruby`,
`rust`, `scala`, `solidity`, `swift`, `typescript`, `udev`

### 4.3 What each supported language defines

The following captures are present in the reference query files. The Rust reimplementation
MUST produce equivalent captures.

**Python:**
- `name.definition.class` — class names
- `name.definition.function` — function names
- `name.definition.constant` — module-level identifier assignments (TSL_PACK path only; absent from tree-sitter-languages fallback)
- `name.reference.call` — call targets (direct identifiers and attribute access)

**Rust:**
- `name.definition.class` — struct, enum, union, and type alias names
- `name.definition.method` — methods inside `impl` blocks
- `name.definition.function` — free functions
- `name.definition.interface` — trait names
- `name.definition.module` — `mod` items
- `name.definition.macro` — macro definitions
- `name.reference.call` — direct call and field method call targets; macro invocations
- `name.reference.implementation` — trait and type names in `impl` items

**Go** (the following captures are present in both backends):
- `name.definition.function` — top-level function declarations
- `name.definition.method` — method declarations
- `name.definition.type` — type specifications
- `name.reference.call` — call targets
- `name.reference.type` — type identifier references

The following Go captures are present only in the TSL_PACK backend (`CACHE_VERSION=4`):
- `name.definition.module` — package clause identifier
- `name.definition.interface` — interface type specifications
- `name.definition.class` — struct type specifications
- `name.reference.module` — entire `import_spec` node text (not just the package name)
- `name.definition.variable` — var declarations
- `name.definition.constant` — const declarations

**JavaScript:**
- `name.definition.method` — method definitions (excluding `constructor`)
- `name.definition.class` — class and class declaration names
- `name.definition.function` — function declarations, generator functions, arrow/function variables and assignments, and object-literal function-valued properties
- `name.reference.call` — call targets (excluding `require`)
- `name.reference.class` — `new` expression constructors

**TypeScript** (standalone SCM; does NOT inherit from or extend the JavaScript SCM):
- `name.definition.function` — function signatures and function declarations
- `name.definition.method` — method signatures, abstract method signatures, and method definitions
- `name.definition.class` — abstract class declarations, class declarations, and interface declarations (used as class)
- `name.definition.interface` — interface declarations
- `name.definition.module` — module declarations
- `name.definition.type` — type alias declarations
- `name.definition.enum` — enum declarations
- `name.reference.type` — type annotation identifiers
- `name.reference.class` — `new` expression constructors

Note: TypeScript's SCM has **no `name.reference.call` capture**. Tree-sitter query files have no inheritance mechanism; the TypeScript SCM is a completely standalone file. TypeScript files that contain no type annotations and no `new` expressions will produce zero reference tags, triggering the Pygments fallback (§3.2).

**Java:**
- `name.definition.class` — class declarations
- `name.definition.method` — method declarations
- `name.definition.interface` — interface declarations
- `name.reference.call` — method invocation names
- `name.reference.implementation` — type identifiers in `implements`/`extends` lists
- `name.reference.class` — `new` expressions and superclass references

---

## 5. Tag Cache

### 5.1 Purpose

Tag extraction is computationally expensive. The implementation MUST cache parsed tags on
disk and reuse them when the file has not changed.

### 5.2 Cache location

The persistent cache MUST be stored at `{root}/.aider.tags.cache.v{CACHE_VERSION}` where
`CACHE_VERSION` is an integer that MUST be incremented whenever the cache format changes or
query files are updated in a way that invalidates prior results.

[KNOWN REFERENCE BEHAVIOR: The effective `CACHE_VERSION` is runtime-determined: `3` when using the tree-sitter-languages backend and `4` when using the tree-sitter-language-pack backend (`USING_TSL_PACK = True`). The Rust implementation MUST use distinct cache directory names or version numbers for each backend variant to prevent cross-contamination of cached tag data.]

### 5.3 Cache key and value

- **Key:** the absolute path string of the file.
- **Value:** a record containing:
  - `mtime`: the file's modification timestamp (floating-point seconds) at the time of
    last extraction.
  - `data`: the list of `Tag` objects produced by extraction.

### 5.4 Cache validity

On a cache lookup, the implementation MUST check that the stored `mtime` equals the current
file modification time. If they differ, the cache entry MUST be treated as a miss and the
file re-parsed.

### 5.5 Cache error recovery

If the persistent cache becomes corrupt or unusable (e.g., SQLite errors), the
implementation MUST:

1. Attempt to delete and recreate the cache directory.
2. If recreation also fails, fall back to an in-memory cache (an ordinary hash map) for
   the duration of the current process. The in-memory cache MUST follow the same key/value
   semantics as the persistent cache.
3. Emit a warning to the caller when falling back.

### 5.6 In-memory map cache

The implementation MUST maintain a separate in-memory cache for the full computed repo map
string. The cache is process-local and is not persisted to disk. The cache key is
mode-dependent (see §11 for the full specification): in `"auto"` mode the key is a 5-tuple
including `mentioned_fnames` and `mentioned_idents`; in `"files"` mode the key is a 3-tuple
excluding those fields. `"manual"` mode does NOT use the keyed map cache (see §11). Each
non-None key component is a `tuple(sorted(...))` of values. Empty collections are
represented as `None` in the key rather than empty tuples.

---

## 6. Graph Construction

### 6.1 Data structures

The following intermediate structures are built from all tags across all files:

- `defines`: `identifier_name → set<rel_fname>` — maps each identifier to all files that
  define it.
- `references`: `identifier_name → list<rel_fname>` — maps each identifier to all files
  that reference it (with multiplicity: the same file may appear multiple times if it
  references the same name in multiple places).
- `definitions`: `(rel_fname, identifier_name) → set<Tag>` — maps each (file, name) pair
  to the set of definition tags for that name in that file.

### 6.2 No-reference fallback

If the global `references` structure is empty after processing all files, the implementation
MUST substitute `references = {name: list(fnames) for name, fnames in defines.items()}`,
treating every defined identifier as also self-referenced by every file that defines it.

### 6.3 Graph type

The graph MUST be a weighted directed multi-graph (multiple edges with distinct `ident`
attributes are allowed between the same pair of nodes).

### 6.4 Edge weights

For each identifier `ident` that appears in both `defines` and `references`:

1. Compute a base multiplier `mul`:

   | Condition                                                             | Effect      |
   |-----------------------------------------------------------------------|-------------|
   | `ident` is in `mentioned_idents`                                      | `mul *= 10` |
   | `ident` length ≥ 8 AND (`_` present with ≥1 alpha char, OR `-` present with ≥1 alpha char, OR both upper and lower case present) | `mul *= 10` |
   | `ident` starts with `"_"`                                             | `mul *= 0.1`|
   | More than 5 distinct files define `ident`                             | `mul *= 0.1`|

   Each applicable condition is applied independently and multiplicatively.

2. Compute `num_refs` = the number of times each unique `referencer` appears in
   `references[ident]` (i.e., count occurrences; `references[ident]` is a list with
   multiplicity). For each unique `(referencer, definer)` pair:

   - Compute `use_mul = mul`.
   - If `referencer` is one of the chat files (in `chat_rel_fnames`): `use_mul *= 50`.
   - Edge weight = `use_mul × sqrt(num_refs)`.
   - Add a directed edge `referencer → definer` with this weight and `ident` metadata.

3. Note: self-edges (referencer == definer) are allowed and not filtered out.

### 6.5 Self-edges for unreferenced definitions

For each identifier in `defines` that does NOT appear in `references` (after the no-reference
fallback in §6.2 has been applied — i.e., this applies when the fallback was NOT triggered
and the identifier genuinely has no references), the implementation MUST add a self-edge
for every file that defines it:

```
definer → definer, weight = self_edge_weight (see §14), ident = identifier
```

This ensures that files with only definitions still participate in PageRank.

[SOURCE AMBIGUITY: The self-edge logic runs against `defines.keys()` filtered by `ident not in references`. After the no-reference fallback (§6.2), all define-only identifiers would already be in references, so this branch only activates when there are *some* references globally but specific identifiers have no references at all.]

---

## 7. PageRank Ranking

### 7.1 Personalization vector

Before running PageRank, the implementation MUST compute a per-file personalization score.
Let `N` be the total number of unique files across `chat_fnames` and `other_fnames`.
Let `personalize = 100.0 / N`.

For each file (identified by `rel_fname`), compute `current_pers` as follows:

1. Start at 0.
2. If the file's absolute path is in `chat_fnames`: `current_pers += personalize`.
3. If `rel_fname` is in `mentioned_fnames`: `current_pers = max(current_pers, personalize)`.
   (This avoids double-counting files that are both in chat and explicitly mentioned.)
4. If any component of `rel_fname`'s path (directory parts, basename with extension,
   basename without extension) appears in `mentioned_idents`:
   `current_pers += personalize`.

Only files with `current_pers > 0` are included in the personalization dict.

5. For each file in `anchor_contributions` (derived in §7.1a):
   `current_pers += anchor_contributions[file]`.
   This step is independent of steps 2–4; contributions are additive.
   Anchor contributions are pre-computed weights that already incorporate the multiplier
   and any ambiguity division (see §7.1a).

The `dangling` nodes dict MUST be set equal to the personalization dict when personalization
is non-empty.

### 7.1a Anchor resolution and weight computation

Before computing the personalization vector, the implementation MUST resolve anchor
inputs to `anchor_contributions: HashMap<rel_fname, f64>`. Let `personalize = 100/N`
and `anchor_w = anchor_weight_multiplier`.

1. **`anchor_fnames`**: for each path, compute `rel_fname`. Add `personalize * anchor_w`
   to `anchor_contributions[rel_fname]`. Additive across multiple inputs.

2. **`anchor_idents`**: for each identifier, look up `tag_index.defines[ident]`.
   - If not found: silently ignored (debug log only).
   - If found in N_matches files: add `personalize * anchor_w / N_matches` to each
     defining file in `anchor_contributions`.
   - If N_matches > 1: emit a `WARN` message naming the files and suggesting
     the `file:ident` form for disambiguation.

3. **`anchor_scoped`**: for each `(path, ident)` pair, compute `rel_fname` from path,
   add `personalize * anchor_w` to `anchor_contributions[rel_fname]`, and add `ident`
   to a `scoped_idents` set.

After resolution, `scoped_idents` MUST be merged with `mentioned_idents` before
`build_graph` is called, so scoped identifiers receive the ×10 edge-weight boost (§6.4).

Anchor resolution MUST occur after the TagIndex is built (§6.1) and before
personalization is computed (§7.1).

### 7.1b Anchor forced inclusion

After building `ranked_tags` (§7.5) and prepending important files (§8.2),
the implementation MUST extract all existing entries for anchor files from their
natural (low) rank position and move them to the front of `ranked_tags`.
For any anchor file with no existing entry (not in graph), a `Bare` entry with
`score = f64::MAX` MUST be inserted at the front.

This ensures anchors are always within the binary-search window regardless of budget,
while preserving any definition tags they carry from the normal ranking pipeline.

### 7.2 PageRank algorithm

The implementation MUST run PageRank on the constructed graph with:

- `weight = "weight"` (use the edge weight attribute)
- `personalization` and `dangling` dicts as computed in §7.1 (or absent if empty)
- Damping factor: **0.85** (configurable via `pagerank_damping`; see §14)
- Convergence tolerance: **1.0e-6** (configurable via `pagerank_tol`; see §14)
- Maximum iterations: **100** (configurable via `pagerank_max_iter`; see §14)

Note: the reference implementation relies on NetworkX's default values for all three parameters
— no explicit values are passed to `nx.pagerank()`. The Rust implementation MUST pass them
explicitly since no equivalent implicit defaults exist.

### 7.3 ZeroDivisionError handling

If PageRank raises a division-by-zero error:

1. Retry PageRank without personalization.
2. If it raises again, return an empty result (no ranked tags).

### 7.4 Distributing rank to definitions

After PageRank assigns a scalar rank to each graph node (file), the implementation MUST
distribute each node's rank across its outgoing edges proportionally by weight:

```
For each node `src` in the graph:
    src_rank = pagerank[src]
    total_weight = sum of weights of all outgoing edges from src
    For each outgoing edge (src → dst, weight w, ident i):
        ranked_definitions[(dst, i)] += src_rank * w / total_weight
```

This produces a per-(file, identifier) rank score.

### 7.5 Building the ranked tag list

1. Sort `ranked_definitions` items descending by score (tie-break: reverse-lexicographic
   (descending) on the `(dst, ident)` key).
2. For each `(fname, ident)` entry (excluding files in `chat_rel_fnames`), append all
   definition `Tag` objects from `definitions[(fname, ident)]` to `ranked_tags`.
3. Track which files have already been included via their definition tags.
4. Iterate over all graph nodes sorted by PageRank score descending:
   - Remove the node from `rel_other_fnames_without_tags` (tracking which other-files
     are represented in the graph at all).
   - If the file is not already included in `ranked_tags`, append a bare file entry
     `(rel_fname,)` (a 1-element tuple, no associated tags).
5. For any file in `other_fnames` that was never added to the graph at all
   (remains in `rel_other_fnames_without_tags`), append a bare file entry `(rel_fname,)`
   at the end of `ranked_tags`.
6. If `exclude_unranked` is `true` (see §14), remove from `ranked_tags` any bare file entry
   whose computed PageRank score is ≤ **0.0001**. Files appended in step 5 (never added to
   the graph) have an implicit PageRank score of 0.0 and MUST be treated as ≤ 0.0001 for
   this filter. Definition-tag entries (files with at least one tag in the output) MUST NOT
   be removed by this filter.

---

## 8. Important Files

### 8.1 Definition

A set of "important" files is pre-defined by filename and directory pattern.
These represent conventional repository files that agents and users expect to see at the top
of any repo map. The implementation MUST treat the following paths as important:

**Version control:** `.gitignore`, `.gitattributes`

**Documentation:** `README`, `README.md`, `README.txt`, `README.rst`, `CONTRIBUTING`,
`CONTRIBUTING.md`, `CONTRIBUTING.txt`, `CONTRIBUTING.rst`, `LICENSE`, `LICENSE.md`,
`LICENSE.txt`, `CHANGELOG`, `CHANGELOG.md`, `CHANGELOG.txt`, `CHANGELOG.rst`,
`SECURITY`, `SECURITY.md`, `SECURITY.txt`, `CODEOWNERS`

**Package management:** `requirements.txt`, `Pipfile`, `Pipfile.lock`, `pyproject.toml`,
`setup.py`, `setup.cfg`, `package.json`, `package-lock.json`, `yarn.lock`,
`npm-shrinkwrap.json`, `Gemfile`, `Gemfile.lock`, `composer.json`, `composer.lock`,
`pom.xml`, `build.gradle`, `build.gradle.kts`, `build.sbt`, `go.mod`, `go.sum`,
`Cargo.toml`, `Cargo.lock`, `mix.exs`, `rebar.config`, `project.clj`, `Podfile`,
`Cartfile`, `dub.json`, `dub.sdl`

**Configuration:** `.env`, `.env.example`, `.editorconfig`, `tsconfig.json`, `jsconfig.json`,
`.babelrc`, `babel.config.js`, `.eslintrc`, `.eslintignore`, `.prettierrc`, `.stylelintrc`,
`tslint.json`, `.pylintrc`, `.flake8`, `.rubocop.yml`, `.scalafmt.conf`, `.dockerignore`,
`.gitpod.yml`, `sonar-project.properties`, `renovate.json`, `dependabot.yml`,
`.pre-commit-config.yaml`, `mypy.ini`, `tox.ini`, `.yamllint`, `pyrightconfig.json`

**Build:** `webpack.config.js`, `rollup.config.js`, `parcel.config.js`, `gulpfile.js`,
`Gruntfile.js`, `build.xml`, `build.boot`, `project.json`, `build.cake`, `MANIFEST.in`

**Testing:** `pytest.ini`, `phpunit.xml`, `karma.conf.js`, `jest.config.js`, `cypress.json`,
`.nycrc`, `.nycrc.json`

**CI/CD:** `.travis.yml`, `.gitlab-ci.yml`, `Jenkinsfile`, `azure-pipelines.yml`,
`bitbucket-pipelines.yml`, `appveyor.yml`, `circle.yml`, `.circleci/config.yml`,
`.github/dependabot.yml`, `codecov.yml`, `.coveragerc`

**Containers:** `Dockerfile`, `docker-compose.yml`, `docker-compose.override.yml`

**Cloud/serverless:** `serverless.yml`, `firebase.json`, `now.json`, `netlify.toml`,
`vercel.json`, `app.yaml`, `terraform.tf`, `main.tf`, `cloudformation.yaml`,
`cloudformation.json`, `ansible.cfg`, `kubernetes.yaml`, `k8s.yaml`

**Database:** `schema.sql`, `liquibase.properties`, `flyway.conf`

**Frameworks:** `next.config.js`, `nuxt.config.js`, `vue.config.js`, `angular.json`,
`gatsby-config.js`, `gridsome.config.js`

**API docs:** `swagger.yaml`, `swagger.json`, `openapi.yaml`, `openapi.json`

**Dev environment:** `.nvmrc`, `.ruby-version`, `.python-version`, `Vagrantfile`

**Quality:** `.codeclimate.yml`, `codecov.yml`

**Docs tooling:** `mkdocs.yml`, `_config.yml`, `book.toml`, `readthedocs.yml`, `.readthedocs.yaml`

**Package registries:** `.npmrc`, `.yarnrc`

**Linting:** `.isort.cfg`, `.markdownlint.json`, `.markdownlint.yaml`

**Security:** `.bandit`, `.secrets.baseline`

**Misc:** `.pypirc`, `.gitkeep`, `.npmignore`

**Directory pattern:** Any `*.yml` file directly inside `.github/workflows/` is important
regardless of its exact filename.

Matching is by `rel_fname` (normalized path from repo root). A file matches if its normalized
path equals one of the listed entries exactly, or satisfies the `.github/workflows/` directory
pattern above.

### 8.2 Ordering guarantee

Before the binary search (§10.2), the implementation MUST prepend important-file entries to
the ranked tag list:

1. Compute `important = filter_important_files(other_rel_fnames)` — the subset of
   `other_fnames` whose relative paths match the important-file list.
2. Remove from `important` any file already present in `ranked_tags` (to avoid duplication).
3. Convert each remaining entry to a bare file entry `(rel_fname,)`.
4. Prepend these entries to the front of `ranked_tags`.

The result is: `[important_files...] + [ranked_definition_tags...] + [unranked_files...]`

**Rationale:** Important files always appear at the top of the map when budget allows, even
if PageRank would score them low.

---

## 9. Map Rendering

### 9.1 to_tree

If the input tag slice is empty, `to_tree` MUST return an empty string `""` immediately
(not `"\n"`).

Otherwise, given a slice of the ranked tag list and the set of `chat_rel_fnames`, the implementation
MUST produce a UTF-8 text output as follows:

1. Sort the input tag slice lexicographically. (Tags sort by `(rel_fname, fname, line, name, kind)`;
   bare 1-element tuples sort by their `rel_fname` string.)
   The combined file set deduplication in §2.2 guarantees a file never appears as both a full
   Tag and a bare tuple in the same input; no tie-break between the two types is needed.
2. Append a sentinel entry `(None,)` at the end to flush the last file.
3. Iterate through the sorted-plus-sentinel list. Maintain `cur_fname`, `cur_abs_fname`,
   and `lois` (list of integer line numbers, or `None`).
4. If `tag[0]` is in `chat_rel_fnames`: skip this tag entirely (do not execute the
   remaining steps for this iteration).
5. Whenever `tag[0] != cur_fname` (file boundary):
   a. If `lois is not None`: emit `"\n" + cur_fname + ":\n"` followed by the rendered tree
      for `(cur_abs_fname, cur_fname, lois)` (see §9.2). Reset `lois = None`.
   b. Else if `cur_fname is not None` (and it is not `None`): emit `"\n" + cur_fname + "\n"`.
   c. If the incoming tag is a full `Tag` object (not a bare tuple): set `lois = []` and
      record `cur_abs_fname = tag.fname`.
   d. Set `cur_fname = tag[0]`.
6. If `lois is not None` (after the file-boundary check):
   append `tag.line` to `lois`.
7. After building the full output string, truncate every line to at most `max_line_length`
   characters (see §14).
8. Append a final newline `"\n"` to the output.

### 9.2 render_tree

Given `(abs_fname, rel_fname, lois)`:

1. Read the file's text content. If unreadable, return an empty string.
2. Ensure the content ends with a newline (append `"\n"` if not).
3. Create (or retrieve from cache) a `TreeContext` for the file with parameters:
   - `color = false`
   - `line_number = false`
   - `child_context = false`
   - `last_line = false`
   - `margin = 0`
   - `mark_lois = false`
   - `loi_pad = 0`
   - `show_top_of_file_parent_scope = false`
4. Reset the context's `lines_of_interest` to an empty set, call `add_lines_of_interest(lois)`,
   call `add_context()`, and return the result of `format()`.

The render result MUST be cached by the key `(rel_fname, tuple(sorted(lois)), mtime)`. The
`TreeContext` object itself MUST be stored in a dict keyed by `rel_fname` alone; on lookup
the stored `mtime` MUST be compared against the current file `mtime` and the entry replaced
on mismatch (old entries are overwritten, not accumulated).

The `tree_cache` (render result cache) MUST be cleared at the start of each uncached map
computation (§10.2). The `tree_context_cache` persists across computations for the lifetime
of the `RepoMap` instance.

---

## 10. Token Budget and Binary Search

### 10.1 Token counting

The implementation MUST count tokens using a tiktoken-compatible BPE encoder
(domain constraint: tiktoken is the standard tokenizer for OpenAI-family models; the
`tiktoken-rs` crate wraps the same underlying Rust BPE implementation that the Python
`tiktoken` package uses, producing byte-identical token counts). The specific encoding
SHOULD default to `cl100k_base` unless the caller overrides it at construction time.
For texts shorter than **200 characters**, the encoder is called directly on the full text.
For longer texts, the implementation MUST use sampling:

1. Split the text into lines, preserving line endings (`splitlines(keepends=True)`).
2. Compute `step = num_lines // 100 or 1`. Sample every `step`-th line.
3. Count tokens in the sampled text.
4. Estimate: `est_tokens = (sample_tokens / len(sample_text)) * len_text`.

### 10.2 Binary search

Given `max_map_tokens` and the full `ranked_tags` list of length `N`:

1. Initialize:
   - `lower_bound = 0`, `upper_bound = N`
   - `best_tree = None`, `best_tree_tokens = 0`
   - `middle = min(max_map_tokens // 25, N)` (floor division)
2. Loop while `lower_bound <= upper_bound`:
   a. Render `to_tree(ranked_tags[:middle], chat_rel_fnames)`.
   b. Count tokens: `num_tokens`.
   c. Compute `pct_err = |num_tokens - max_map_tokens| / max_map_tokens`.
   d. If `(num_tokens <= max_map_tokens AND num_tokens > best_tree_tokens) OR pct_err < 0.15`:
      - Update `best_tree = current_tree`, `best_tree_tokens = num_tokens`.
      - If `pct_err < 0.15`: **break early** (result is close enough).
   e. Standard binary search step:
      - If `num_tokens < max_map_tokens`: `lower_bound = middle + 1`
      - Else: `upper_bound = middle - 1`
      - `middle = (lower_bound + upper_bound) // 2`
3. Return `best_tree` (may be `None` if no valid tree was found).

The **0.15 (15%) tolerance** means the implementation MAY return a tree that slightly
exceeds `max_map_tokens` if the overrun is within 15%.

### 10.3 No-chat-files mode

When `chat_fnames` is empty AND `max_context_window` is set, compute:

```
effective_max = min(max_map_tokens * map_mul_no_files, max_context_window - 4096)
```

If `effective_max > 0`, use it as `max_map_tokens` for the current call only. (If
`max_context_window < 4096`, `effective_max` is negative and the substitution does NOT occur.)

The default `map_mul_no_files` MUST be **8**.

---

## 11. Map Cache (Refresh Policy)

The implementation MUST support four named refresh modes:

| Mode       | Behavior                                                                                     |
|------------|----------------------------------------------------------------------------------------------|
| `"manual"` | If a map has been computed before in this session, return it as-is. Never recompute.         |
| `"always"` | Never use the in-memory map cache. Always recompute.                                         |
| `"files"`  | Always use the in-memory map cache if the key matches.                                       |
| `"auto"`   | Use the cache only if the previous computation took more than **1.0 second**. Otherwise recompute. |

`force_refresh = true` bypasses all cache checks and forces recomputation.

Regardless of mode or `force_refresh`, the implementation MUST write the computed result to
both the keyed map cache and `last_map` after every computation. `force_refresh` bypasses
cache **reads** only; the write-back always occurs.

`"manual"` mode does NOT consult the keyed map cache. It stores the most recent computed
result in a separate, keyless per-instance field (`last_map`) and returns it on all
subsequent calls regardless of input parameters.

In `"auto"` mode the cache key MUST be the 5-tuple
`(tuple(sorted(chat_fnames)), tuple(sorted(other_fnames)), max_map_tokens, tuple(sorted(mentioned_fnames)), tuple(sorted(mentioned_idents)))`.
In `"files"` mode the key MUST be the 3-tuple
`(tuple(sorted(chat_fnames)), tuple(sorted(other_fnames)), max_map_tokens)`, excluding
`mentioned_fnames` and `mentioned_idents`. Empty collections are represented as `None`
rather than empty tuples. The reference source is unambiguous on this point
(`if self.refresh == "auto"` guard).

---

## 12. Output Format

### 12.1 Final string assembly

The return value of `get_repo_map` MUST be:

```
{repo_content_prefix} + {files_listing}
```

Where:
- `repo_content_prefix` is a caller-supplied prefix string (MAY contain a `{other}` 
  placeholder that is substituted with `"other "` when chat files are present or `""` when
  they are not).
- `files_listing` is the output of the binary-search-selected `to_tree` call.

If `files_listing` is empty or `None`, `get_repo_map` MUST return `None` / empty.

### 12.2 File entries with tags

```
\n{rel_fname}:\n{rendered_tree_content}
```

### 12.3 File entries without tags

```
\n{rel_fname}\n
```

### 12.4 Line truncation

Every line in the output MUST be truncated to at most `max_line_length` characters
(see §14; default **100**) before the final newline is appended.

---

## 13. Edge Cases

### 13.1 Empty or zero token budget

If `max_map_tokens <= 0`, `get_repo_map` MUST return `None` immediately.

### 13.2 No other files

If `other_fnames` is empty, `get_repo_map` MUST return `None` immediately.

### 13.3 No tags anywhere

If tag extraction yields no tags for any file, the graph is empty. The implementation MUST
still produce a listing of file names (bare entries) using the file-only format (§12.3) up
to the token budget.

### 13.4 RecursionError

If the ranking algorithm raises a `RecursionError` (can happen for extremely large repos),
the implementation MUST:
1. Emit an error to the caller.
2. Set `max_map_tokens = 0` (permanently disabling the map for this instance).
3. Return `None`.

### 13.5 Missing files

Files in `chat_fnames` or `other_fnames` that do not exist on disk at ranking time MUST be
skipped. The implementation MUST emit a warning the first time a given path is skipped, and
MUST NOT warn again for the same path within the same process.

[KNOWN BUG IN REFERENCE: `warned_files` is defined as a class-level attribute (`class RepoMap: warned_files = set()`). Mutating it via `self.warned_files.add(fname)` modifies the shared class-level set, so warning deduplication spans all `RepoMap` instances in the process rather than being per-instance. Intended behavior: per-instance deduplication (MUST NOT warn again for the same path within the same instance). The Rust implementation SHOULD use per-instance state.]

If a file passes the existence check at ranking time but disappears before its mtime can be
read (TOCTOU), `get_tags` MUST return an empty tag list for that file and MAY emit a warning.
This is distinct from the upfront existence check — the file is silently skipped at extraction
time without updating `warned_files`.

### 13.6 Binary files / unreadable files

Files whose content cannot be decoded as UTF-8 MUST produce no tags. The implementation
SHOULD attempt to read with error substitution rather than raising (i.e., replace
undecodable bytes).

### 13.7 Chat files excluded from output

Files that appear in `chat_fnames` MUST NOT appear in the repo map output. They are excluded
at both the ranking step (§7.5 step 2) and the rendering step (§9.1 step 4).

### 13.8 Known reference dead code

The following constructs in the reference implementation are dead code and MUST NOT be
replicated:

- `self.cache_threshold = 0.95` — set in `__init__` but never read; vestigial field.
- `if tags is None: continue` — `list()` never returns `None`; the branch is unreachable.

The Rust implementation SHOULD omit both.

---

## 14. Configuration Parameters

| Parameter             | Type         | Default   | Description                                                                   |
|----------------------|--------------|-----------|-------------------------------------------------------------------------------|
| `map_tokens`          | int          | 1024      | Maximum tokens in the repo map output                                         |
| `root`                | path         | cwd       | Repository root directory                                                     |
| `repo_content_prefix` | string\|None | None      | Prefix prepended to the map output                                            |
| `verbose`             | bool         | false     | Emit diagnostic output: initialization message, per-file scan progress, rendered map token count, and cache errors |
| `max_context_window`  | int\|None    | None      | LLM context window size; enables no-chat expansion                            |
| `map_mul_no_files`    | int          | 8         | Multiplier for token budget when no chat files are open                       |
| `refresh`             | string       | "auto"    | Map cache refresh mode: "auto", "manual", "always", "files"                  |
| `force_refresh`       | bool         | false     | Bypass cache reads and force map recomputation (§11)                          |
| `exclude_unranked`    | bool         | false     | When true, files with PageRank ≤ 0.0001 are excluded from the map output      |
| `self_edge_weight`    | float        | 0.1       | Weight applied to self-edges for unreferenced definitions (§6.5)              |
| `max_line_length`     | int          | 100       | Maximum characters per output line before truncation (§9.1, §12.4)           |
| `pagerank_damping`    | float        | 0.85      | PageRank damping factor (§7.2)                                                |
| `pagerank_tol`        | float        | 1.0e-6    | PageRank convergence tolerance (§7.2)                                         |
| `pagerank_max_iter`   | int          | 100       | PageRank maximum iterations (§7.2)                                            |
| `anchor_fnames`       | Vec\<PathBuf\> | []     | Files to use as RWR restart seeds; always included in map output (§7.1a, §7.1b) |
| `anchor_idents`       | HashSet\<String\> | {}  | Identifiers whose defining files are used as RWR restart seeds; weight divided equally when ambiguous (§7.1a) |
| `anchor_scoped`       | Vec\<(PathBuf, String)\> | [] | File+ident pairs; file gets full anchor weight, ident gets edge-weight boost (§7.1a) |
| `anchor_weight_multiplier` | float   | 10.0      | Personalization weight multiplier for anchor files relative to `personalize` (100/N) (§7.1a) |

**`map_tokens` and `max_map_tokens`:** `map_tokens` is the constructor parameter stored at
initialization. Internally, the algorithm references this as `max_map_tokens`. They represent
the same value; `max_map_tokens` may be temporarily overridden within a single call (§10.3) or
permanently set to 0 by §13.4. The stored `map_tokens` field is never mutated.

**I/O abstraction:** The reference implementation injects a `main_model` for token counting
and an `io` object for file reading and diagnostic emission. The Rust library resolves this as
follows:
- **File reading:** `std::fs::read_to_string` with `String::from_utf8_lossy` for undecodable
  bytes (§13.6). No injectable abstraction.
- **Diagnostics (warnings, errors, verbose output):** the `tracing` crate. The library emits
  structured events via `tracing::warn!`, `tracing::error!`, and `tracing::debug!`. Callers
  install a subscriber; if none is configured, events are silently dropped. The CLI binary
  installs a `tracing-subscriber` that routes to stderr, gated on `--verbose` (§17).
- **Token counting:** the `tiktoken-rs` crate (§10.1). No injection needed.

**Intentional SPEC extensions over hardcoded reference values:** `self_edge_weight` (§6.5,
§14; hardcoded `0.1` in reference) and `max_line_length` (§12.4, §14; hardcoded `100` in
reference) are exposed as configurable parameters in the Rust API. Their defaults match the
reference values exactly.

---

## 15. Resolved Questions

1. **I/O and diagnostic abstraction** — Resolved. See §14 I/O abstraction note: `tracing`
   for diagnostics, `std::fs` for file reading.

2. **Reference fallback for files with defs but no refs** — Resolved. A supplemental
   `-idents.scm` tree-sitter query replaces the Python Pygments fallback. See §3.2.
---

## 16. Non-Goals

The following are explicitly out of scope for this specification:

- The LLM interface or model integration.
- File watching or automatic re-indexing on filesystem changes. DEFERRED
- Git integration (the reference uses git history only for warning messages about deleted files).
- The interactive UI (spinner, progress callbacks) — these are implementation details for the caller. DEFERRED
- Any output format beyond the plain-text repo map string described in §12. DEFERRED

---

## 17. CLI Specification

### 17.1 Binary name and entry point

The CLI binary MUST be named `repo-mapper`. Invocation:

```
repo-mapper [OPTIONS] [REPO_PATH]
```

`REPO_PATH` is an optional positional argument that sets the repository root. If omitted,
root detection proceeds as described in §17.2.

### 17.2 Repository root detection

The root is determined in priority order:
1. `--root <PATH>` flag (explicit override).
2. Positional `REPO_PATH` argument.
3. Walk parent directories from CWD looking for the first ancestor containing a `.git/`
   directory.
4. Fall back to CWD if no `.git/` ancestor is found.

### 17.3 File enumeration

When `other_fnames` is not explicitly provided by the user, the CLI MUST walk the repository
tree from `root` to build the file list. The walk MUST:

1. Respect `.gitignore` rules (using standard gitignore semantics from the root).
2. Skip hidden directories (names starting with `.`) except where explicitly included.
3. Follow symlinks to regular files; skip symlinks to directories.
4. Produce paths sorted lexicographically (consistent with §2.2).

### 17.4 Flags

| Flag | Short | Type | Default | Maps to |
|------|-------|------|---------|---------|
| `--root <PATH>` | | path | (auto-detected) | `root` |
| `--max-tokens <N>` | `-t` | int | 1024 | `map_tokens` |
| `--max-context-window <N>` | | int | (none) | `max_context_window` |
| `--chat-file <PATH>` | `-c` | path (repeatable) | (empty) | `chat_fnames` |
| `--mention-file <PATH>` | `-m` | path (repeatable) | (empty) | `mentioned_fnames` |
| `--mention-ident <NAME>` | `-i` | string (repeatable) | (empty) | `mentioned_idents` |
| `--refresh <MODE>` | | string | `auto` | `refresh` |
| `--force-refresh` | | bool | false | `force_refresh` |
| `--exclude-unranked` | | bool | false | `exclude_unranked` |
| `--max-line-length <N>` | | int | 100 | `max_line_length` |
| `--pagerank-damping <F>` | | float | 0.85 | `pagerank_damping` |
| `--pagerank-tol <F>` | | float | 1.0e-6 | `pagerank_tol` |
| `--pagerank-max-iter <N>` | | int | 100 | `pagerank_max_iter` |
| `--verbose` | `-v` | bool | false | configures tracing subscriber |
| `--progress` | `-p` | bool | false | show a spinner on stderr during map generation and print elapsed time on completion |
| `--anchor <VALUE>` | `-a` | string (repeatable) | (empty) | `anchor_scoped`, `anchor_fnames`, or `anchor_idents` — resolved in that priority order (§7.1a) |

`--chat-file`, `--mention-file`, `--mention-ident`, and `--anchor` are repeatable (may be specified
multiple times). Paths in `--chat-file` and `--mention-file` are resolved relative to CWD
before being passed to the library.

`--anchor` value resolution (applied in order per value):
1. If the value contains `:` and the part before the colon resolves to an existing file
   (absolute, CWD-relative, or root-relative): treated as `file:ident` → `anchor_scoped`.
2. If the value resolves to an existing file: `anchor_fnames`.
3. Otherwise: `anchor_idents` (tag-index lookup at runtime).

### 17.5 Output routing

- The repo map string MUST be written to **stdout**.
- All diagnostics (warnings, errors, verbose progress) MUST be written to **stderr** via the
  `tracing-subscriber` installed at startup.
- `--verbose` sets the tracing filter to `DEBUG`, emitting per-file scan progress, token
  counts, and cache status. Without `--verbose`, only `WARN` and `ERROR` events are emitted.
- `RUST_LOG` environment variable MAY override the filter level.

### 17.6 Exit codes

| Code | Meaning |
|------|---------|
| `0` | Success — repo map written to stdout |
| `1` | Fatal error — I/O failure, invalid arguments, unrecoverable parse error |
| `2` | No map generated — token budget exhausted, no files found, or all files excluded |