embeddenator-fs 0.25.0

EmbrFS: FUSE filesystem backed by holographic engrams
Documentation
# embeddenator-fs Quick Start (After Gap Closure)

**Note**: This is a **future-looking** quick start guide. Most features listed here are not yet implemented.
See [GAP_ANALYSIS.md](GAP_ANALYSIS.md) and [ROADMAP.md](ROADMAP.md) for current status and implementation plan.

---

## What is embeddenator-fs?

A FUSE-based holographic filesystem that stores file trees using Vector Symbolic Architecture (VSA), providing:
- **100% bit-perfect reconstruction** via algebraic correction layer
- **Hierarchical bundling** for efficient storage and retrieval
- **Content-addressable queries** using semantic similarity
- **Incremental updates** without full re-ingestion

---

## Installation (Future)

```bash
# Install CLI
cargo install embeddenator-fs --features fuse

# Or build from source
git clone https://github.com/tzervas/embeddenator-fs
cd embeddenator-fs
cargo build --release --features fuse
```

---

## CLI Usage (Planned)

### Ingest a Directory

```bash
# Create engram from directory
embrfs ingest ./my_project -o project.engram -m project.json

# With verbose output
embrfs ingest ./my_project -o project.engram -m project.json --verbose

# With compression
embrfs ingest ./my_project -o project.engram -m project.json --compress zstd
```

### Extract Files

```bash
# Extract to directory
embrfs extract project.engram -m project.json -o ./restored

# Extract with resonator recovery
embrfs extract project.engram -m project.json -o ./restored --resonator
```

### Mount as Filesystem

```bash
# Mount read-only
embrfs mount project.engram -m project.json /mnt/project

# Access files
ls /mnt/project
cat /mnt/project/README.md

# Unmount
fusermount -u /mnt/project
```

### Query Content

```bash
# Search for files containing "authentication"
embrfs query project.engram -m project.json "authentication"

# Hierarchical query (faster for large datasets)
embrfs query project.engram -m project.json "authentication" --hierarchical

# JSON output
embrfs query project.engram -m project.json "authentication" --format json
```

### Incremental Updates

```bash
# Add new file
embrfs add project.engram -m project.json new_file.txt

# Remove file
embrfs remove project.engram -m project.json old_file.txt

# Modify file
embrfs modify project.engram -m project.json updated_file.txt

# Compact (remove deleted files)
embrfs compact project.engram -m project.json
```

### Statistics

```bash
# Show engram info
embrfs stats project.engram -m project.json

# Example output:
# Files: 1,234
# Total size: 45.3 MB
# Chunks: 11,520
# Correction overhead: 2.4%
# Perfect chunks: 97.6%
```

---

## Library Usage (Current)

### Basic Example

```rust
use embeddenator_fs::{EmbrFS, ReversibleVSAConfig};

fn main() -> std::io::Result<()> {
    let config = ReversibleVSAConfig::default();

    // Create filesystem
    let mut fs = EmbrFS::new();

    // Ingest directory
    fs.ingest_directory("./data", true, &config)?;

    // Save engram and manifest
    fs.save_engram("data.engram")?;
    fs.save_manifest("data.json")?;

    // Extract files
    let engram = EmbrFS::load_engram("data.engram")?;
    let manifest = EmbrFS::load_manifest("data.json")?;
    EmbrFS::extract(&engram, &manifest, "./restored", true, &config)?;

    Ok(())
}
```

### Hierarchical Example

```rust
use embeddenator_fs::{EmbrFS, ReversibleVSAConfig};

fn main() -> std::io::Result<()> {
    let config = ReversibleVSAConfig::default();
    let mut fs = EmbrFS::new();

    // Ingest files
    fs.ingest_directory("./large_dataset", false, &config)?;

    // Create hierarchical structure
    let hierarchical = fs.bundle_hierarchically(500, false, &config)?;

    // Save hierarchical manifest
    embeddenator_fs::save_hierarchical_manifest(&hierarchical, "hier.json")?;

    // Query hierarchical structure
    let query = /* create query vector */;
    let bounds = embeddenator_fs::HierarchicalQueryBounds::default();
    let results = embeddenator_fs::query_hierarchical_codebook(
        &hierarchical,
        &fs.engram.codebook,
        &query,
        &bounds
    );

    println!("Found {} matching chunks", results.len());

    Ok(())
}
```

### Incremental Updates Example

```rust
use embeddenator_fs::{EmbrFS, ReversibleVSAConfig};

fn main() -> std::io::Result<()> {
    let config = ReversibleVSAConfig::default();
    let mut fs = EmbrFS::new();

    // Initial ingestion
    fs.ingest_directory("./project", false, &config)?;

    // Add new file
    fs.add_file("new_feature.rs", "new_feature.rs".to_string(), true, &config)?;

    // Modify existing file
    fs.modify_file("updated.rs", "updated.rs".to_string(), true, &config)?;

    // Remove old file
    fs.remove_file("deprecated.rs", true)?;

    // Compact to reclaim space
    fs.compact(true, &config)?;

    // Save updated engram
    fs.save_engram("project.engram")?;
    fs.save_manifest("project.json")?;

    Ok(())
}
```

---

## Performance Characteristics (Expected)

### Ingestion
- **Throughput**: 200-300 MB/s (single-threaded)
- **Memory**: ~4x chunk size per worker
- **Chunk size**: 4 KB default (tunable)

### Query
- **Flat query**: O(N) where N = total chunks
- **Hierarchical query**: O(log N) with beam search
- **Latency**: <100ms for 1M chunks (P50)

### Extraction
- **Throughput**: 300-400 MB/s (correction overhead minimal)
- **Memory**: Streaming, ~16 MB typical

### FUSE Operations
- **Read latency**: <1ms (cached), <10ms (uncached)
- **Throughput**: 100-200 MB/s sequential reads

---

## Configuration Tuning

### Chunk Size
```rust
// Default 4KB, tune based on file sizes
pub const DEFAULT_CHUNK_SIZE: usize = 4096;

// Larger chunks: better for large files, higher memory
// Smaller chunks: better for small files, more overhead
```

### Hierarchical Query Bounds
```rust
let bounds = HierarchicalQueryBounds {
    k: 10,               // Top-k results
    candidate_k: 100,    // Candidates before reranking
    beam_width: 32,      // Frontier nodes
    max_depth: 4,        // Tree depth
    max_expansions: 128, // Node expansions
    max_open_indices: 16,  // LRU cache size
    max_open_engrams: 16,  // LRU cache size
};
```

### Correction Store
```rust
// Automatic - no tuning needed
// Tracks:
// - BitFlips: sparse differences
// - BlockReplace: clustered differences
// - Verbatim: complete mismatches
```

---

## Troubleshooting

### "Out of memory during ingestion"
→ Use streaming ingestion (planned): `--stream` flag

### "Query is slow"
→ Use hierarchical bundling: `bundle_hierarchically()`
→ Tune `HierarchicalQueryBounds` parameters

### "High correction overhead"
→ Check data characteristics (high entropy → more corrections)
→ Consider compression before ingestion

### "FUSE mount fails"
→ Check FUSE is installed: `fusermount --version`
→ Check permissions: may need `user_allow_other` in `/etc/fuse.conf`

---

## What's Next?

See [ROADMAP.md](ROADMAP.md) for planned features and timeline.

**Current Phase**: Phase 2A - Component Extraction (Alpha)
**Next Milestone**: v0.20.0-beta.1 (CLI + Examples + Tests)

---

## Resources

- [GAP_ANALYSIS.md]GAP_ANALYSIS.md - Detailed gap analysis
- [ROADMAP.md]ROADMAP.md - Implementation roadmap
- [API Docs]https://docs.rs/embeddenator-fs - Library documentation
- [Main Repo]https://github.com/tzervas/embeddenator - Parent project

---

**Note**: This quick start describes the **intended** user experience after gap closure. Most CLI features shown here do not yet exist. Current version (v0.20.0-alpha.3) is library-only. See [GAP_ANALYSIS.md](GAP_ANALYSIS.md) for implementation status.