tokensave 4.0.5

Code intelligence tool that builds a semantic knowledge graph from Rust, Go, Java, Scala, TypeScript, Python, C, C++, Kotlin, C#, Swift, and many more codebases
# Multiproject Support

## Problem

The global git post-commit hook runs `tokensave sync` in every repo. Before the init/sync separation (see `tokensave init`), this silently created databases in non-enrolled repos. Even with that fix, tokensave currently assumes one project per database. Monorepos and multi-app workspaces (e.g. `Code/App1/`, `Code/App2/`) have no way to scope queries to a single sub-project.

## Goal

A single `.tokensave` database can cover multiple projects. Each node and file belongs to exactly one project. Queries gain an optional `project` filter. New cross-project queries enable comparison (e.g. "which project has the most god classes?").

## Concepts

- **Single-project mode** (default): All nodes belong to `[root]`. Existing behavior, no changes needed.
- **Multiproject mode** (`tokensave init --multiproject`): The indexer creates a project for each direct subfolder of the root. Files in the root itself belong to `[root]`.

Example:
```
Code/              <- root, init --multiproject here
  shared.rs        <- project: [root]
  App1/
    src/main.rs    <- project: App1
  App2/
    src/lib.rs     <- project: App2
```

## Approaches Considered

### Approach A: Project as a column on `nodes` + `files` (recommended)

Add `project TEXT NOT NULL DEFAULT '[root]'` to both `nodes` and `files` tables.

**Schema change**: `ALTER TABLE nodes ADD COLUMN project`; same for `files`. Migration v7 sets all existing rows to `[root]`. Fresh `create_schema` includes the column from the start.

**Query changes**: Every query that accepts `path_prefix` gains an optional `project` parameter. ~15 DB methods + ~15 MCP handlers. The SQL filter is `AND n.project = ?` when provided.

**Config**: `TokenSaveConfig` gains `multiproject: bool`. The indexer reads direct subdirectories to assign project names.

**New MCP tools**:
- `tokensave_projects` -- list available projects in the database
- Cross-project analysis tools (GROUP BY project variants of existing tools)

**Pros**:
- Clean semantic model. Project is a first-class indexed column.
- Enables SQL `GROUP BY project` for cross-project comparison queries.
- Project list is discoverable -- LLM can ask "which projects?" then scope any query.
- Fast filtering via index.

**Cons**:
- Largest diff. Every query method signature changes.
- Migration touches every existing row (sets to `[root]`).
- FTS rebuild not needed (project isn't full-text searched).

### Approach B: Project as a virtual/computed concept (no schema change)

Derive project from `file_path` at query time. Config stores the project-to-prefix mapping. A helper translates `project=App1` into `path_prefix=App1/` and reuses existing path filtering.

**Pros**: Tiny diff. No migration. No query signature changes. Fast to ship.

**Cons**: No indexed column -- can't `GROUP BY project` efficiently. Cross-project comparisons need multiple round-trips or ugly `CASE WHEN` SQL. Not truly first-class. Folder renames require manual config updates.

### Approach C: Hybrid -- column exists, queries stay path-based

Add the column (like A) but don't change existing query signatures. A resolution layer in MCP handlers converts `project` to `path_prefix`. The column is only used for new GROUP BY / cross-project queries.

**Pros**: Column exists for powerful SQL. Existing queries untouched -- smaller handler diff. Incremental migration path.

**Cons**: Two parallel filtering mechanisms (project column vs path_prefix). Must keep column in sync during indexing. Risk of drift if mapping diverges from folder structure.

## Decision

**Approach A** -- make project a first-class column.

The whole point is making project a real concept. Approach B doesn't deliver cross-project queries. Approach C creates redundant filtering that can drift. A is more work up front but the result is clean: one column, one filter, SQL GROUP BY just works.

The query signature churn is mechanical -- every `path_prefix: Option<&str>` method gains `project: Option<&str>`, every handler reads one more optional param.

## Cross-Project Query Ideas

These are new queries enabled by having `project` as a column:

- **Project summary**: node/file/edge counts per project
- **Complexity comparison**: average/max complexity per project
- **God class comparison**: god classes grouped by project
- **Dead code by project**: unreferenced symbols per project
- **Cross-project dependencies**: edges where source.project != target.project
- **Coupling matrix**: how many edges cross between each pair of projects
- **Doc coverage by project**: percentage of public symbols documented, per project

## Migration Strategy

- **v7 migration**: `ALTER TABLE nodes ADD COLUMN project TEXT NOT NULL DEFAULT '[root]'`; same for `files`. All existing rows get `[root]` via the DEFAULT. No data rewrite needed -- SQLite DEFAULT handles it.
- **Fresh databases**: `create_schema` includes the column and index from the start.
- **Index**: `CREATE INDEX idx_nodes_project ON nodes(project)` for fast filtering.

## Open Questions

- Should `project` and `path` filters be combinable? (e.g. "god classes in App1 under src/main/")
- Should `tokensave sync` in multiproject mode re-discover new subdirectories automatically, or require `tokensave init` again?
- Should there be a way to manually map project names to paths (beyond auto-discovery from subdirectories)?