magellan 0.1.0

A dumb, deterministic codebase mapping tool for Rust projects
Documentation
# Magellan

**A dumb, deterministic codebase mapping tool for Rust projects.**

Magellan watches your Rust source files, extracts AST-level facts, and builds a searchable graph database of symbols and references. No semantic analysis, no magic—just deterministic, observable data extraction.

## What Magellan Does

- **Watches** directories for `.rs` file changes (Create/Modify/Delete)
- **Extracts** AST-level facts: functions, structs, enums, traits, modules, impl blocks
- **Tracks** symbol references: function calls and type references
- **Persists** everything to a sqlitegraph database
- **Handles** errors gracefully—keeps running even when files are unreadable
- **Shuts down** cleanly on SIGINT/SIGTERM with status reporting

## What Magellan Does NOT Do

Magellan is intentionally limited:
- ❌ No semantic analysis or type checking
- ❌ No LSP server or language features
- ❌ No async runtimes or background thread pools
- ❌ No config files
- ❌ No initial full scans (requires events to trigger)
- ❌ No non-Rust file support
- ❌ No web APIs or network services

## Installation

### From Source

```bash
# Clone the repository
git clone <repository-url>
cd magellan

# Build the binary
cargo build --release

# The binary will be at target/release/magellan
```

### Requirements

- Rust 1.70+ (2021 edition)
- Linux/macOS (signal handling uses Unix signals)
- SQLite 3 (via sqlitegraph dependency)

## Quick Start

```bash
# Start watching a project
magellan watch --root /path/to/rust/project --db /path/to/magellan.db

# In another terminal, check status
magellan watch --root /path/to/rust/project --db /path/to/magellan.db --status

# Magellan will now:
# 1. Watch for .rs file changes
# 2. Extract symbols and references
# 3. Store them in magellan.db
# 4. Log each event: "MODIFY src/lib.rs symbols=5 refs=3"
```

## Usage

### Basic Command

```bash
magellan watch --root <DIR> --db <FILE> [--debounce-ms <N>]
```

**Arguments:**
- `--root <DIR>` - Directory to watch recursively (required)
- `--db <FILE>` - Path to sqlitegraph database (required)
- `--debounce-ms <N>` - Debounce delay in milliseconds (default: 500)
- `--status` - Print counts and exit immediately (optional)

**Examples:**

```bash
# Watch current directory
magellan watch --root . --db ./magellan.db

# Watch with custom debounce
magellan watch --root ./src --db ./cache/magellan.db --debounce-ms 1000

# Check database status
magellan watch --root . --db ./magellan.db --status
# Output:
# files: 42
# symbols: 387
# references: 1241
```

### Output Format

**Normal Operation:**
```
Magellan watching: /path/to/project
Database: /path/to/magellan.db
CREATE src/main.rs symbols=2 refs=0
MODIFY src/lib.rs symbols=5 refs=3
DELETE src/old.rs
ERROR src/permission_denied.rs Permission denied (os error 13)
```

**Shutdown:**
```
SHUTDOWN
```

**Status:**
```
files: 42
symbols: 387
references: 1241
```

## Database Schema

Magellan stores data in a sqlitegraph database with the following structure:

**Nodes:**
- `File` - `{ path: String, hash: String }`
- `Symbol` - `{ name: String, kind: String, byte_start: usize, byte_end: usize }`
- `Reference` - `{ file: String, byte_start: usize, byte_end: usize }`

**Edges:**
- `DEFINES` - File → Symbol (which file defines this symbol)
- `REFERENCES` - Reference → Symbol (what symbol is referenced)

**Symbol Kinds:**
- Function
- Struct
- Enum
- Trait
- Module
- Impl

## Error Handling

Magellan is designed to be resilient:

**Permission Errors:**
```
ERROR /path/to/file.rs Permission denied (os error 13)
```
- Logs the error
- Continues processing other files
- No crash, no retry

**Syntax Errors:**
- Files with invalid Rust syntax are skipped
- No symbols extracted from malformed files
- Watcher continues running

**Missing Files:**
- Files deleted during processing are handled gracefully
- ENOENT errors are silently skipped
- No crashes on race conditions

## Signal Handling

**SIGINT (Ctrl+C) / SIGTERM:**
- Prints "SHUTDOWN"
- Exits cleanly
- Database is properly closed
- No data loss

## Architecture

```
magellan/
├── src/
│   ├── main.rs              # Binary entry point (236 LOC)
│   ├── lib.rs               # Public API exports
│   ├── watcher.rs           # Filesystem event watcher (156 LOC)
│   ├── ingest.rs            # Tree-sitter parser (184 LOC)
│   ├── indexer.rs           # Event coordination (125 LOC)
│   ├── references.rs        # Reference extraction (171 LOC)
│   └── graph/
│       ├── mod.rs           # CodeGraph API (306 LOC)
│       ├── schema.rs        # Node/edge types (29 LOC)
│       ├── files.rs         # File operations (161 LOC)
│       ├── symbols.rs       # Symbol operations (107 LOC)
│       └── references.rs    # Reference operations (138 LOC)
└── tests/
    ├── cli_smoke_tests.rs   # Binary tests (72 LOC)
    ├── signal_tests.rs      # Signal handling tests (81 LOC)
    ├── error_tests.rs       # Error handling tests (86 LOC)
    ├── status_tests.rs      # Status flag tests (58 LOC)
    └── ...
```

## Testing

```bash
# Run all tests
cargo test

# Run specific test suite
cargo test --test cli_smoke_tests
cargo test --test signal_tests
cargo test --test error_tests
cargo test --test status_tests

# Run with output
cargo test -- --nocapture
```

**Test Coverage:**
- 37 tests across 12 test suites
- Unit tests for parsing, graph operations
- Integration tests for indexer, watcher
- Process tests for binary (signals, errors, status)
- All tests pass in <5 seconds

## Performance

**Indexing Speed:**
- ~1000 lines/sec on modern hardware
- SHA-256 hashing for content change detection
- In-memory HashMap for O(1) file lookups

**Database Size:**
- ~1KB per symbol (including metadata)
- ~500B per reference
- Typical Rust project: 100-1000 symbols, 500-5000 references

**Memory Usage:**
- ~50MB base RSS
- ~10MB per 1000 files indexed
- Grows with project size

## Limitations

**By Design:**
- Only processes `.rs` files
- No initial full scan (waits for events)
- AST-level only (no semantic analysis)
- No cross-crate resolution
- No macro expansion tracking

**Technical:**
- Single-threaded event processing
- No incremental parsing
- No query optimization
- No indexing heuristics

## Contributing

Magellan is **feature frozen**. No new features are planned.

**Bug Reports:**
- Open an issue with reproduction steps
- Include `magellan --status` output
- Attach test case if possible

**Patches:**
- Must pass all existing tests
- Must maintain LOC limits (≤300 per file)
- Must follow existing code style
- No new features without explicit approval

## License

```
Magellan - Dumb, deterministic codebase mapping tool
Copyright (C) 2025  Feanor

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.
```

**SPDX-License-Identifier: GPL-3.0-or-later**

## Acknowledgments

Built with:
- [notify]https://github.com/notify-rs/notify - Filesystem watching
- [tree-sitter]https://tree-sitter.github.io/ - AST parsing
- [tree-sitter-rust]https://github.com/tree-sitter/tree-sitter-rust - Rust grammar
- [sqlitegraph]https://github.com/synbit/sqlitegraph - Graph persistence
- [signal-hook]https://github.com/vorner/signal-hook - Signal handling

## Version

**Current Version:** 0.1.0
**Release Date:** 2025-12-24
**Status:** Feature Frozen

See [CHANGELOG.md](CHANGELOG.md) for version history.