embeddenator 0.20.0-alpha.1

Sparse ternary VSA holographic computing substrate
# ADR-016: Component Decomposition Strategy

## Status

Accepted

## Date

2025-12-31

## Context

The Embeddenator project began as a monolithic repository containing all components: core VSA operations, retrieval systems, filesystem integration, I/O handling, observability, CLI tools, and MCP servers. While this approach facilitated rapid initial development, it has created several challenges as the project scales:

### Problems with Monolithic Architecture

1. **Tight Coupling**: All subsystems are interdependent, making it difficult to modify one component without affecting others
2. **Slow Build Times**: Changes to any component require rebuilding the entire codebase (~15,000 LOC)
3. **Versioning Challenges**: Cannot version VSA core separately from filesystem or CLI tools
4. **Dependency Bloat**: Users who only need VSA operations must pull in filesystem and FUSE dependencies
5. **Testing Complexity**: Integration tests span multiple concerns, making failures hard to isolate
6. **Parallel Development**: Multiple developers cannot work independently on different subsystems
7. **Deployment Flexibility**: Cannot deploy VSA library separately from filesystem components

### Current Structure Issues

```
embeddenator/
├── src/
│   ├── vsa/          # Core VSA (should be independent)
│   ├── retrieval.rs  # Depends on VSA
│   ├── embrfs.rs     # Depends on VSA + retrieval
│   ├── kernel_interop.rs  # Depends on VSA + fs
│   ├── io/           # Could be independent
│   ├── cli.rs        # Depends on everything
│   └── main.rs       # CLI entrypoint
├── crates/           # Sister projects
│   ├── embeddenator-screen-mcp/
│   └── ...
```

**Problem:** Everything is in `src/`, creating a single compilation unit with complex internal dependencies.

### Industry Best Practices

Projects like Rust's `tokio`, `serde`, and `async-std` demonstrate successful component decomposition:

- **tokio**: Split into `tokio-core`, `tokio-io`, `tokio-timer`, `tokio-executor`
- **serde**: Separated into `serde` (core), `serde_json`, `serde_derive`
- **async-std**: Modular with optional features for filesystem, networking, etc.

**Pattern:** Core primitives in one crate, domain-specific functionality in separate crates with optional features.

## Decision

We will decompose the Embeddenator monolith into **focused, independently versioned component libraries** following a three-phase approach:

### Phase 1: Repository Setup and Sister Project Stabilization ✅

**Goal:** Establish infrastructure for multi-crate workspace

**Timeline:** December 2025

**Actions:**
1. Create separate repositories for each component (14 repos total)
2. Set up CI/CD for individual components
3. Document architectural decisions (ADRs)
4. Stabilize existing sister projects (embeddenator-bench, embeddenator-contract-bench, etc.)

**Status:** Complete

### Phase 2: Core Component Extraction

**Goal:** Extract core embeddenator functionality into modular libraries

**Timeline:** January-February 2026

#### Phase 2A: Foundation Components (4 weeks)

Extract components with clear boundaries and minimal dependencies:

1. **embeddenator-vsa** (Week 1)
   - Core VSA operations: `SparseVec`, binding, bundling
   - Ternary primitives: `Trit`, `PackedTritVec`
   - Codebook management
   - Dimensional operations
   - **No dependencies** except standard library

2. **embeddenator-retrieval** (Week 1-2)
   - Inverted index for sparse ternary vectors
   - Resonator networks for pattern completion
   - Signature-based retrieval
   - **Depends on:** embeddenator-vsa

3. **embeddenator-fs** (Week 2)
   - EmbrFS filesystem implementation
   - FUSE integration (optional feature)
   - Algebraic correction layer
   - **Depends on:** embeddenator-vsa, embeddenator-retrieval

4. **embeddenator-interop** (Week 2-3)
   - Kernel VSA integration
   - Backend abstractions
   - **Depends on:** embeddenator-vsa, embeddenator-fs

5. **embeddenator-io** (Week 3)
   - Binary envelope format
   - Compression codecs (zstd, lz4)
   - **Independent** (no Embeddenator dependencies)

6. **embeddenator-obs** (Week 3)
   - Logging infrastructure
   - Metrics collection
   - High-resolution timing
   - **Independent** (no Embeddenator dependencies)

#### Phase 2B: Application Components (2 weeks)

Extract user-facing applications:

1. **embeddenator-cli** (Week 4)
   - Command-line interface
   - **Depends on:** All Phase 2A components

2. **MCP Servers** (Week 4)
   - embeddenator-context-mcp
   - embeddenator-security-mcp
   - embeddenator-screen-mcp
   - **Depends on:** embeddenator-vsa, embeddenator-obs

### Phase 3: Integration and Cleanup (2 weeks)

**Goal:** Finalize migration and remove monolith

**Actions:**
1. Merge all extraction branches
2. Update monorepo to be a thin integration layer
3. Publish all components to crates.io
4. Update documentation and examples
5. Performance regression testing
6. Remove obsolete code

### Component Dependency Graph

```
Level 0 (No dependencies):
  ├─ embeddenator-vsa
  ├─ embeddenator-io
  └─ embeddenator-obs

Level 1 (Depends on vsa):
  ├─ embeddenator-retrieval
  └─ embeddenator-bench

Level 2 (Depends on retrieval):
  └─ embeddenator-fs

Level 3 (Depends on fs):
  └─ embeddenator-interop

Level 4 (Depends on multiple):
  ├─ embeddenator-cli (depends on all)
  └─ embeddenator (main crate, re-exports)
```

### Versioning Strategy

Each component will follow semantic versioning independently:

- **embeddenator-vsa**: Core stability, likely 1.0.0 soon
- **embeddenator-retrieval**: Experimental features, 0.x.x for now
- **embeddenator-fs**: Beta quality, 0.x.x
- **embeddenator-io**: Stable format, could reach 1.0.0
- **embeddenator-obs**: Utility crate, 0.x.x

**Main crate:** `embeddenator` becomes a facade that re-exports all components with version pinning for compatibility.

## Consequences

### Positive

1. **Independent Development**: Teams can work on VSA, filesystem, and CLI simultaneously
2. **Faster Compilation**: Changes to VSA don't require rebuilding filesystem code
3. **Flexible Deployment**: Users can depend on only `embeddenator-vsa` for core operations
4. **Better Testing**: Component-specific tests run in isolation
5. **Clearer Ownership**: Each component has a dedicated maintainer
6. **Versioning Flexibility**: Can release VSA 1.1.0 without changing filesystem version
7. **Dependency Hygiene**: Filesystem users don't need to install VSA benchmark dependencies

### Negative

1. **Initial Overhead**: Significant upfront work to extract and test components
2. **Coordination Cost**: Must manage dependencies between 14+ repositories
3. **Documentation Burden**: Each component needs standalone documentation
4. **Breaking Changes**: API changes in one component may require updates in others
5. **Version Hell Risk**: Incompatible component versions could cause integration issues

### Mitigation Strategies

1. **Path Dependencies During Development**: Use `path = "../embeddenator-vsa"` during Phase 2, publish to crates.io in Phase 3
2. **Feature Flags**: Make expensive dependencies optional (e.g., `fuse` feature in embeddenator-fs)
3. **Comprehensive Testing**: Maintain integration tests in main crate to catch cross-component issues
4. **Version Pinning**: Main crate pins exact versions to ensure compatibility
5. **CI/CD Automation**: Automated testing across all components on every commit
6. **Documentation Generation**: Use `cargo doc` with inter-crate links

## References

- [CRATE_STRUCTURE_AND_CONCURRENCY.md]../CRATE_STRUCTURE_AND_CONCURRENCY.md - Detailed component architecture
- [ADR-017: Phase 2A Component Extraction Strategy]ADR-017-phase2a-component-extraction.md - Tactical extraction plan
- [SPLIT_TRACKER.md]../../SPLIT_TRACKER.md - Progress tracking
- [Cargo Workspaces Documentation]https://doc.rust-lang.org/cargo/reference/workspaces.html
- [Tokio Architecture]https://tokio.rs - Multi-crate async runtime (inspiration)
- [Serde Architecture]https://serde.rs - Core + ecosystem pattern (inspiration)

## Related ADRs

- **ADR-001**: Sparse Ternary VSA (foundation for embeddenator-vsa)
- **ADR-009**: Deterministic Hierarchical Artifacts (affects embeddenator-fs)
- **ADR-013**: Hierarchical Manifest Format (part of embeddenator-fs)
- **ADR-017**: Phase 2A Component Extraction Strategy (tactical implementation)

## Notes

**Why not a Cargo workspace?** We chose separate repositories over a single workspace to:
- Enable independent CI/CD pipelines
- Allow different release cadences
- Facilitate open-source contributions to specific components
- Permit future language diversity (e.g., Python bindings for embeddenator-vsa)

**Future Considerations:**
- May consolidate some components post-1.0 if boundaries prove incorrect
- Could create additional components (e.g., `embeddenator-network` for distributed VSA)
- Might extract platform-specific code (e.g., `embeddenator-linux`, `embeddenator-macos`)

---

**Status:** Accepted  
**Author:** Tyler Zervas  
**Reviewers:** Workflow Orchestrator, QA Tester  
**Implementation:** Phase 1 complete, Phase 2A in progress (50% complete as of 2026-01-04)