oxur-smap
Source mapping for the Oxur language - Tracks code transformations through the compilation pipeline for accurate error reporting.
Overview
oxur-smap is a foundation crate that provides source mapping capabilities for the Oxur compiler. It enables rustc compiler errors to be translated back to the original Oxur source code positions by tracking transformations through multiple compilation stages.
Key Features
- Multi-stage transformation tracking - Surface Forms → Core Forms → Rust AST
- Backward error translation - Map Rust compiler errors to original Oxur source
- Thread-safe NodeId generation - Atomic counter with global singleton
- Content-based caching - Structure-only hashing for cache key generation
- Zero dependencies - Lightweight foundation crate
- Comprehensive test coverage - 98%+ coverage with unit and integration tests
Architecture
The source map tracks three transformation stages:
┌─────────────────────────────────────────────────────────────┐
│ OXUR COMPILATION PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Oxur Source (.ox) │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Parser │ Creates Surface Form AST │
│ │ │ Records: NodeId → SourcePos │
│ └──────────────────┘ │
│ ↓ │
│ Surface Forms (Lisp-like AST) │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Expander │ Macro expansion, desugaring │
│ │ │ Records: Surface NodeId → Core NodeId│
│ └──────────────────┘ │
│ ↓ │
│ Core Forms (Canonical IR) │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Lowering │ Generate Rust AST │
│ │ │ Records: Core NodeId → Rust NodeId │
│ └──────────────────┘ │
│ ↓ │
│ Rust AST │
│ ↓ │
│ ┌──────────────────┐ │
│ │ rustc │ Compile to binary │
│ │ │ Errors reference Rust NodeIds │
│ └──────────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Error Mapper │ Lookup: Rust NodeId → SourcePos │
│ │ │ Traverses: Rust → Core → Surface │
│ └──────────────────┘ │
│ ↓ │
│ User-friendly error message with original source position │
│ │
└─────────────────────────────────────────────────────────────┘
Core Types
NodeId
Unique identifier for AST nodes across all compilation stages.
;
// Thread-safe global generator
;
SourcePos
Tracks source code positions with file path, line, column, and span length.
SourceMap
Maps NodeIds through transformation stages and provides backward lookup.
Usage
Basic Workflow
use ;
let mut map = new;
// Stage 1: Parser creates surface node
let surface = new_node_id;
let pos = new;
map.record_surface_node;
// Stage 2: Expander creates core node
let core = new_node_id;
map.record_expansion;
// Stage 3: Lowering creates rust node
let rust = new_node_id;
map.record_lowering;
// Error translation: Map Rust error back to original source
if let Some = map.lookup
REPL Usage
For REPL input, use the SourcePos::repl() constructor:
let pos = repl; // Line 1, column 1, length 20
assert_eq!;
Cache Key Generation
For incremental compilation, generate cache keys based on transformation structure:
let hash = map.content_hash;
// Use hash as part of cache key for compiled artifacts
// Note: Hash is structure-based, excludes source positions
Concurrency and Freezing
For build-once, read-many scenarios, freeze the map after construction:
// During compilation (single-threaded)
let mut map = new;
// ... record all transformations ...
map.freeze;
// During error reporting (can be multi-threaded)
// Safe to share &map across threads
if let Some = map.lookup
Frozen maps panic on modification attempts (defensive programming).
Performance
oxur-smap is designed for negligible compiler overhead with sub-microsecond operations.
Benchmarks
Hardware: Apple Silicon M-series | Rust 1.75+ optimized build
Core Operations (nanoseconds):
| Operation | Time | Description |
|---|---|---|
new_node_id() |
7.5 ns | Thread-safe atomic counter |
record_surface_node() |
58 ns | Record source position |
record_expansion() |
84 ns | Link surface → core |
record_lowering() |
102 ns | Link core → rust |
lookup() |
120 ns | Full 3-stage chain traversal |
freeze() |
1.3 µs | One-time freeze operation (100 nodes) |
Scaling (complete 3-stage chains):
| Nodes | Total Operations | Build Time | Per-Node |
|---|---|---|---|
| 100 | 300 | 9.1 µs | 91 ns |
| 1,000 | 3,000 | 115 µs | 115 ns |
| 10,000 | 30,000 | 1.26 ms | 126 ns |
Performance characteristics:
- O(1) lookups: Constant-time error translation regardless of project size
- O(n) construction: Linear scaling with number of AST nodes
- Zero runtime overhead: Frozen maps have no synchronization cost
Run benchmarks:
Memory Overhead
Per transformation chain (surface → core → rust):
- NodeId storage: 4 bytes × 3 = 12 bytes
- SourcePos data: ~40-80 bytes (varies with file path length)
- HashMap overhead: ~24-48 bytes (3 entries with load factor)
- Total: ~76-140 bytes per chain
Real-world estimates:
| Project Size | AST Nodes | Memory Overhead |
|---|---|---|
| Small (100 LOC) | ~100 | <20 KB |
| Medium (10k LOC) | ~10,000 | ~1-2 MB |
| Large (100k LOC) | ~100,000 | ~10-20 MB |
Compiler impact:
- Small project: <10 µs build time, <20 KB memory
- Large project: ~13 ms build time, ~10-20 MB memory
- Error lookup: O(1) constant time regardless of size
Design Trade-offs
Frozen flag vs Arc<RwLock>:
We use a simple frozen: bool flag instead of Arc<RwLock<SourceMap>>:
- Frozen flag: 0 ns overhead (compile-time check), panics on misuse
- RwLock: 50-100 ns per read operation (~50% slowdown)
Rationale: Compilation is sequential (no concurrent writes needed), error reporting is read-only (safe to share &SourceMap). The frozen flag provides defensive programming with zero runtime cost.
Three HashMaps vs unified structure:
We maintain three separate HashMaps instead of HashMap<NodeId, TransformChain>:
- Allows incremental construction during compilation stages
- Minimal memory overhead (~16 bytes per entry vs ~40+ for Option-based)
- Clear separation matching compilation pipeline phases
See module documentation for complete design rationale.
Design Documentation
This implementation follows the specifications in:
- ODD-0038: Source Map Architecture (definitive specification)
- ODD-0013: Compilation Chain Architecture
- ODD-0001: Oxur Letter of Intent
See ../../design/docs/ for complete design documentation.
Testing
The crate includes comprehensive test coverage:
- Unit tests: Each module (NodeId, SourcePos, SourceMap)
- Integration tests: End-to-end compilation simulation
- Thread safety: Concurrent NodeId generation verification
- Coverage: 98.41% (339/339 lines, 52/52 functions)
Run tests:
Generate coverage report:
API Documentation
Generate full API documentation:
Implementation Notes
Design Decisions
Single Global Counter: We use a single global atomic counter for NodeId generation rather than per-stage ranges (e.g., 100-199 for surface). This simplifies implementation, avoids range exhaustion, and keeps NodeIds opaque.
Structure-Only Hashing: The content_hash() method hashes the transformation graph but NOT source positions. This allows cached artifacts to be reused when code structure is identical, even if moved to different lines. Trade-off: Very subtle semantic changes might be missed (acceptable for v1.0).
Three Separate HashMaps: We maintain three separate HashMaps for each transformation stage rather than a single unified structure. This provides clear separation of concerns and efficient lookup at each stage.
Thread Safety
NodeId generation is thread-safe using AtomicU32 with SeqCst ordering. The global generator can be safely accessed from multiple threads simultaneously.
Dependencies
None - This is a zero-dependency foundation crate.
Contributing
See the main Oxur project documentation:
- CLAUDE.md - Development guidelines
- assets/ai/ai-rust/ - Rust best practices
- Makefile - Development workflow targets
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT license (LICENSE-MIT)
at your option.
Status
Phase 0 (Complete): Foundation implementation with 98.41% test coverage
- Core types: NodeId, SourcePos, SourceMap
- Thread-safe NodeId generation
- Zero-dependency foundation crate
Phase 1 (Complete): Integration with oxur-lang, oxur-comp, oxur-repl
- Migrated all crates to use oxur-smap NodeId
- Removed placeholder source_map.rs implementations
- All 197 workspace tests passing
Phase 2 (Complete): Polish & Optimization
- Concurrency model: Frozen flag pattern (zero runtime overhead)
- Performance instrumentation:
lookup_stats()for profiling - Comprehensive benchmark suite (9 benchmark groups)
- Performance profiling: <10 µs for 100 nodes, ~120 ns lookups
- Complete design decision documentation
- README with performance characteristics
Next Phase: Production use in full Oxur compilation pipeline