# AI Agent Development Guide
> Quick reference for AI agents: architecture decisions, project-specific conventions, and development workflow.
## ๐ Core Concept
**Context Footprint (CF)**: Quantify code coupling via conditional graph traversal.
- **Model**: Directed graph (nodes = code units, edges = dependencies)
- **Metric**: Total tokens reachable from starting node
- **Innovation**: Traversal stops at "good abstractions" (typed + documented) but continues through "leaky" ones
**Design Details**: See [`docs/design.md`](docs/design.md) for formal definition and algorithm.
## ๐๏ธ Architecture Decisions (ADR)
### ADR-001: Hexagonal Architecture (Ports & Adapters)
**Decision**: Strict separation between domain logic (`src/domain/`) and external integrations (`src/adapters/`)
**Rationale**:
- Domain algorithms must be testable without external dependencies (SCIP, file I/O)
- Future language support (TypeScript, Java) only requires new adapters
- Policy experimentation (different pruning strategies) isolated from core traversal
**Implementation**:
```
Domain Layer (src/domain/) Adapters Layer (src/adapters/)
โโ graph.rs โโ scip/ (SCIP parsing)
โโ solver.rs โโ policy/ (pruning strategies)
โโ builder.rs โโ size_function/ (token counting)
โโ ports.rs (traits) โโโโโโโโโดโ doc_scorer/ (doc quality)
```
**Constraints**:
- Domain depends only on `std` + `petgraph`
- Adapters implement domain traits (`PruningPolicy`, `SourceReader`, etc.)
---
### ADR-002: Three-Pass Graph Construction
**Decision**: Build graph in three sequential passes (not single-pass)
**Rationale**:
- SCIP indexes may reference symbols before they're defined (forward references)
- Dynamic edges (SharedStateWrite, CallIn) require full static graph first
- Metrics (`context_size`, `doc_score`) need source file access at node creation
**Passes**:
1. **Allocate nodes** from SCIP definitions โ compute metrics via source spans
2. **Wire static edges** from SCIP occurrences โ standard dependencies
3. **Add dynamic edges** โ reverse lookups for state writers and callers
**File**: `src/domain/builder.rs`
---
### ADR-003: Policy Pattern for Pruning Logic
**Decision**: Pruning decisions abstracted behind `PruningPolicy` trait
**Rationale**:
- Research experiments require different boundary definitions
- "Good abstraction" criteria varies by use case (CI vs audit vs metrics)
- Core traversal algorithm (`CfSolver`) remains stable
**Current Policies**:
- `AcademicBaseline`: Doc presence + type completeness (fast, heuristic)
- `StrictPolicy`: Aggressive pruning (minimal context footprint)
**Extension Point**: Implement `PruningPolicy::evaluate()` for custom strategies
---
### ADR-004: Semantic Data Abstraction
**Decision**: Define SCIP-agnostic `SemanticData` model in domain layer
**Rationale**:
- SCIP is implementation detail (protobuf, Sourcegraph-specific)
- Future: support other indexers (Kythe, LSP, custom analyzers)
- Testing: generate semantic data without SCIP files
**Boundary**: `SemanticData` (domain) โ `ScipDataSourceAdapter` (adapter)
## ๐๏ธ Key Domain Concepts
### Core Types
| `ContextGraph` | Directed graph (nodes + edges) + symbol lookup | `src/domain/graph.rs` |
| `Node` | Code unit (Function/Variable/Type) with metrics | `src/domain/node.rs` |
| `EdgeKind` | Dependency type (Call/Read/Write/ParamType/etc.) | `src/domain/edge.rs` |
| `CfSolver` | BFS traversal with conditional pruning | `src/domain/solver.rs` |
| `PruningPolicy` | Trait: decide if node is boundary or transparent | `src/domain/policy.rs` |
| `SemanticData` | SCIP-agnostic semantic model | `src/domain/semantic.rs` |
### Critical Node Attributes
Every node has:
- **`context_size`**: Token count (basis for CF calculation)
- **`doc_score`**: Documentation quality (0.0-1.0, used by policies)
- **`is_external`**: Third-party library flag (always acts as boundary)
Type-specific attributes (e.g., `typed_param_count` for functions) drive policy decisions.
### Dynamic Expansion Edges
Two special edge types added in Pass 3:
- **`SharedStateWrite`**: Reader โ Writer (mutable global state penalty)
- **`CallIn`**: Untyped function โ Callers (resolve vague signatures from usage)
## ๐งช Testing Conventions
**Strategy**: Test pyramid (55 tests: 44 unit, 9 integration, 2 E2E)
**Details**: See [`docs/testing.md`](docs/testing.md) for comprehensive guide.
### Quick Reference
```bash
# Run all tests
cargo test --lib --tests
# E2E fixture setup
./tests/fixtures/setup_fastapi.sh # Clone FastAPI for real-world test
```
### Test Data Strategy
| Unit | Inline helpers | `#[cfg(test)]` modules in source files |
| Integration | Mock fixtures | `tests/common/fixtures.rs` generators |
| E2E | Real SCIP | `tests/fixtures/simple_python/`, `fastapi/` |
**Key Convention**: E2E tests gracefully skip if SCIP index missing (CI-friendly).
## ๐ Development Workflow
### TDD Cycle
1. **Write failing test** (domain unit test or integration test)
2. **Implement minimal code** (pass the test)
3. **Run quality checks**: `cargo fmt && cargo clippy -- -D warnings`
4. **Commit** when all tests pass
### Pre-Commit Checklist
```bash
cargo fmt # Format code
cargo test --lib --tests # All tests pass
cargo clippy --all-targets -- -D warnings # No warnings
```
## ๐ Project-Specific Conventions
### Code Organization Rules
1. **Domain-first implementation**: Write domain logic before adapters (enables testing without I/O)
2. **Trait-based boundaries**: All external dependencies injected via traits (see `src/domain/ports.rs`)
3. **Error propagation**: Use `Result<T>` + `.context()` (no `.unwrap()` in production code)
### Naming Patterns
- **Test functions**: `test_<scenario>_<expected_result>`
- Example: `test_boundary_node_stops_traversal_but_included_in_reachable_set`
- **Fixture generators**: `create_semantic_data_<scenario>` in `tests/common/fixtures.rs`
- Example: `create_semantic_data_with_shared_state()`
### Module Structure
**Adapters grouped by concern** (not by implementation):
```
src/adapters/
โโ policy/ (not policy_implementations/)
โ โโ academic.rs
โ โโ strict.rs
โโ scip/
โโ doc_scorer/
```
### Testing Helpers
- **Dead code allowed**: Add `#![allow(dead_code)]` to `tests/common/*.rs` (helpers used selectively)
- **Mock implementations**: Prefer constructor chaining (`.with_file().with_file()`)
- **E2E resilience**: Always check fixture existence and skip gracefully
## ๐ง Extension Points
### Adding New Node/Edge Types
1. Update enums in `src/domain/node.rs` or `src/domain/edge.rs`
2. Modify `src/domain/builder.rs` creation logic
3. Update policies in `src/adapters/policy/*.rs`
4. Add tests
### Adding New Pruning Policy
1. Create `src/adapters/policy/<name>.rs`
2. Implement `PruningPolicy` trait + `Default`
3. Add unit tests in same file
4. Export in `src/adapters/policy/mod.rs`
### Adding New Language Support
1. Create adapter in `src/adapters/scip/<language>.rs` (or use external indexer)
2. Implement `SemanticDataSource` trait
3. Map language-specific constructs to domain model
4. Add E2E test in `tests/fixtures/<language>/`
## ๐ Quality Standards
### CI Requirements
All commits must pass (`.github/workflows/test.yml`):
- Unit + integration tests
- `cargo fmt -- --check`
- `cargo clippy -- -D warnings`
- Coverage report to Codecov
### Current Status
- **Tests**: 55 (44 unit, 9 integration, 2 E2E) โ
- **Coverage**: Domain โฅ85%, Policies โฅ80%, Adapters โฅ70%
## ๐จ Known Issues
### Generated SCIP Code
**Issue**: `scip.rs` doc tests fail (protobuf artifacts)
**Workaround**: Skip doc tests via `cargo test --lib --tests`
**Fix applied**: `#[allow(clippy::doc_overindented_list_items)]` in `src/lib.rs`
### Test Helper Dead Code
**Issue**: Clippy warns about unused helpers in `tests/common/`
**Fix**: `#![allow(dead_code)]` at module top (helpers used selectively across test binaries)
## ๐ References
- **Algorithm & Design**: [`docs/design.md`](docs/design.md) (formal CF definition, traversal rules)
- **Testing Guide**: [`docs/testing.md`](docs/testing.md) (comprehensive test strategy)
- **Theoretical Foundation**: [`docs/the-paper.md`](docs/the-paper.md)
### External Resources
- [SCIP Protocol](https://github.com/sourcegraph/scip) (semantic indexing format)
- [petgraph](https://docs.rs/petgraph/) (Rust graph library)
---
## ๐ Terminology
| **Context Footprint (CF)** | Total tokens reachable from a node via conditional traversal |
| **Boundary Node** | Well-abstracted code (typed + documented) where traversal stops |
| **Transparent Node** | Leaky abstraction (untyped/undocumented) where traversal continues |
| **Dynamic Expansion** | Reverse edges (SharedStateWrite, CallIn) added in Pass 3 |
| **Semantic Data** | Domain model abstraction over SCIP (enables testing without protobuf) |